Skip to content

Conversation

@vsl9
Copy link
Collaborator

@vsl9 vsl9 commented May 24, 2023

What does this PR do ?

The PR replaces jiwer with editdistance package to speedup CER estimation in audio-based TN.

Before your PR is "Ready for review"

Pre checks:

  • Have you signed your commits? Use git commit -s to sign.
  • Do all unittests finish successfully before sending PR?
    1. pytest or (if your machine does not have GPU) pytest --cpu from the root folder (given you marked your test cases accordingly @pytest.mark.run_only_on('CPU')).
    2. Sparrowhawk tests bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...
  • If you are adding a new feature: Have you added test cases for both pytest and Sparrowhawk here.
  • Have you added __init__.py for every folder and subfolder, including data folder which has .TSV files?
  • Have you followed codeQL results and removed unused variables and imports (report is at the bottom of the PR in github review box) ?
  • Have you added the correct license header Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. to all newly added Python files?
  • If you copied nemo_text_processing/text_normalization/en/graph_utils.py your header's second line should be Copyright 2015 and onwards Google, Inc.. See an example here.
  • Remove import guards (try import: ... except: ...) if not already done.
  • If you added a new language or a new feature please update the NeMo documentation (lives in different repo).
  • Have you added your language support to tools/text_processing_deployment/pynini_export.py.

PR Type:

  • New Feature
  • Bugfix
  • Documentation
  • Test

If you haven't finished some of the above items you can still open "Draft" PR.

vsl9 and others added 2 commits May 24, 2023 13:43
Copy link
Collaborator

@ekmb ekmb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@ekmb ekmb merged commit 78293a1 into NVIDIA:main May 27, 2023
BuyuanCui pushed a commit to BuyuanCui/NeMo-text-processing that referenced this pull request Jul 6, 2023
* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
mgrafu pushed a commit that referenced this pull request Jul 18, 2023
* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
BuyuanCui pushed a commit that referenced this pull request Dec 12, 2023
* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>
BuyuanCui pushed a commit that referenced this pull request Feb 16, 2024
* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>
ekmb added a commit that referenced this pull request Apr 30, 2024
* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add country codes from hu (#77)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix electronic case for username (#75)

* fix electronic username w/o .

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix ar test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci dirs, enable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.8 release (#79)

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Codeswitched ES/EN ITN  (#78)

* Initial commit for ES-EN codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable export for es_en codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add whitelist, update weights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests for en_es, zone tagged separately in es

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix path to test data for sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile - enable ES/EN tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add __init__.py files

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix issues with failed docker build - due to archiving of debian and issues with re2

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove unused imports and variables

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update date

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable NBSP in sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update copyrights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache path in for ES/EN CI/CD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* minor normalize.py edit for usability (#84)

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* documentation edits for grammar/clarity

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* added --output_field flag for command line interface

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish ITN (#40)

* force two digits for month

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put it in a function, because I reject the garbage pre-commit.ci came up with

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrap some more pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add graph pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete junk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser (copy from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add right tokens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks, more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to ITN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moved to tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nothing actually fixed here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* now most tests pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* electronic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bare fractions is a bit of an overreach

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just inverting the TN whitelist tagger will not work/be useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from English

Signed-off-by: Jim O'Regan <joregan@kth.se>

* overwrite with version from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swap tsv sides

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add optional_era variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make deterministic default, like most of the others

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also add lowercase versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replacing NEMO_SPACE does not work either

Signed-off-by: Jim O'Regan <joregan@kth.se>

* increasing weight... did not work last time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaking test cases, in case it was a sentence splitting issue. It was not

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put the full stops back

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add filler words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try splitting this out to see if it makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* aha, this part should be non-deterministic only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single line only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "increasing weight... did not work last time"

This reverts commit 39b020b50db745dfd6b281c8cbca45a033926996.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* disabling ITN here makes TN work again(?)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "disabling ITN here makes TN work again(?)"

This reverts commit be49d7d5c687876e51c2e9ce1cf1e01491df280f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing the variable name fixes norm tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing test tooling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a piece for area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add country codes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend any_read_digit for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country/area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* first attempt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove country codes for the time being, makes things ambiguous

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove trailing whitespace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update __init__.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transform of TN tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transformation of TN decimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* slight changes to date

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* include space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen was not that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* respect my own copyright xD

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading to constructor; had weirdness in this file, probably due to module-level python-suckage

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading, this has been an oddity before

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try changing this year declaration

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add year + era

Signed-off-by: Jim O'Regan <joregan@kth.se>

* eliminate more module-level data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "eliminate more module-level data loading"

This reverts commit 6a26e5d927817e1308e818758196924441ff7b3a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expose variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extra param for itn mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix parens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adapt/extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix dict init/change keys to something useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* initial stab at prefixed numbers

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert kl. if absent

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* the relative prefixed times

Signed-off-by: Jim O'Regan <joregan@kth.se>

* + comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* enable time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space in both directions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix hours to

Signed-off-by: Jim O'Regan <joregan@kth.se>

* split by before/after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix if

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. 9

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* keep only get_abs_path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add trimmed file

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* two abs_paths... could be fun

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minutes/seconds

Signed-off-by: Jim O'Regan <joregan@kth.se>

* suffix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* one optional

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. or one of suffix/zone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* already disambiguated

Signed-off-by: Jim O'Regan <joregan@kth.se>

* closure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do not insert kl.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix spelling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Delete measure.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete money.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused test pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add/update __init__

Signed-off-by: Jim O'Regan <joregan@kth.se>

* blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix lang

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove year, conflicts with cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space before, not after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cardinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* spurious deletion

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* re-enable SV TN; enable SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "re-enable SV TN; enable SV ITN"

This reverts commit 3ce4dfde1f70a89afc274284f6e4c737b3fac95b.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix singulras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use cdrewrite

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just EOS/BOS

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* omit en/ett, because they are also articles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* strip spaces from decimal part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partial fix, not what I wanted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* en/ett cannot work in itn case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* be more deliberate in graph construction

Signed-off-by: Jim O'Regan <joregan@kth.se>

* accept both

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* +2 tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* (try to) accept singular quantities for plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* retry

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* oops

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* arcmap

Signed-off-by: Jim O'Regan <joregan@kth.se>

* version without ones

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of this, this is where it goes wrong

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* multiple states from both ones, try removing and readding

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove ones, see if that fixes at least the bare quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, dunno why it still breaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move definition

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* local declaration, seems to not be working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* match verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix last two failing tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing tests for telephone and word

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of convert_space, tests fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "put convert_spaces back, change test file; pytest fails"

This reverts commit a7bb7489137b8026aab02aff64df39e874630043.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails, take 2

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deliberately remove spaces rather than have a non-determinism that comes out differently in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try converting the non-breaking spaces in the shell script

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrong place

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update date.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update time.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix comment

Signed-off-by: Jim O’Regan <joregan@kth.se>

* trim comments

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove commented line

Signed-off-by: Jim O’Regan <joregan@kth.se>

* en halv

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update test_sparrowhawk_inverse_text_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Italian_TN (#67)

* add TN italian

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix init

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix LOCATION

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* modify graph_utils

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* correct decimals

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix measure

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh itn (#74)

* Add ZH ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix copyrights and code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove invalid tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve CodeQL issues

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix missing 'zh' option for ITN and correct comment

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update __init__.py

Change to zh instead of en for the imports.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for decimal test data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* update for langauge import

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for Chinese punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* a new class for whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* PYNINI_AVAILABLE = False

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to file import format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* caught duplicates, removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates, arranges for CHInese Yuan updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates accordingly to the comments from last PR. Recreated some of the files due to format issues

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the hours_to and minute_to files used for back counting. ALso removed am and pm suffix files according to the last PR. Recreated some of them for format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* re-added this file to avoid data file import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated gramamr according to last PR. Removed the acceptance of 千

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated according to last PR. Removed comma after decimal points

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for Fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for money and updated according to last PR. Plus process of 元

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar. updates due to the updates in cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR comments. removing am and pm and allowing simple mandarin expression

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* arrangements

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added whitelist grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* word grammar for non-classified items

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated cardinal, decimal, time, itn data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to last PR

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to the updates for cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for more Mandarin punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR. removing am pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* adjustment on the weight

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the targger updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the time tagger

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to changes in tagger on am and pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* verbalizer for fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for mandarin grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* kept this file because using English utils results in data namin error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* merge conflict

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed import os

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed variables

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue, reccreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue recreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed codeing style/format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed coding style and format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicated graph for 毛

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unnecessary comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* unnecessary comment removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test file updated for more cases

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated with a comment explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated the file explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added Mandarin as zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing for dplication

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed unused NEMO objects

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix test file failures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix file failtures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failture

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adap to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adapt to grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adopt to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixing pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed // for zhtn/itn cache

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Update inverse_normalize.py

Added zh as a selection to pass Jenkins checks.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: Alex Cui <alcui@nvidia.com>
Co-authored-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated pynini_export.py file to create far files (#88)

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* readd Swedish (#87)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn 0712 (#89)

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and fixings according to document on natonal gideline

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fraction updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* money updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* punctuation grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* time gramamr updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* tokenizaer updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates on certificate

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* data updated and added due to updates and chanegs to the existing grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* cardinal updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* date grammar changed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test data added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test python file edits

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for tn1.0 and previous tn grammar from contribution

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test cases updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fixed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* dates updated for init files

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated the date for zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added back the itn tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added back measure and math from previou TN

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated for tests reruns

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated weights

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn char (#95)

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* code stle

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* audio-based TN fix for empty pred_text/text (#92)

* fix for empty pred_text

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add unittests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix pytest

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* pip 1.2.0

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* French tn (#91)

* add tests for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add fr tn for cardinals, decimals, fractions and ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* delete it far files from tools

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add languages to run_evaluate

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* remove ambiguous spacing

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* enable sh testing for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile cache date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix test for ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update tn cache for fr

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* resolve codeql issues

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist_tech.tsv (#96)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zhitn 0727 (#93)

* updates on itn grammar to pass sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats for sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates fro sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fix

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for coding style and sparrowhawk test

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated classes for tests on whitelist and word grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for tests on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_word.py

Removed unused import.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_word.py

Removed imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Removing imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Jenkinsfile

changed zh cache to 07-27-23 as it is the latest update.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es tn romans fix (#98)

* fix es tn roman exceptions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update eval script for ITN

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* codeql fix

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Change docker image (#102)

Change docker image to one including sparrowhawk

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Print warning instead exception (#97)

* raise text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* text arg

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* Failed text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* add logger

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* logger

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* NeMo-text-processing

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* info level

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Normalizer.select_verbalizer

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Exception

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restart ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nikolay Karpov <nkarpov@nvidia.com>
Co-authored-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* warning regardless of verbose flag (#107)

* warning

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* self.verbose

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Unpin setuptools (#106)

Signed-off-by: Peter Plantinga <plantinga.peter@proton.me>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed warnings: File is not always closes. (#113)

Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug #111 (ar currencies) (#117)

* fix bug #111 (ar currencies)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci folder

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Logging clean up + IT TN fix (#118)

* fix utils and it TN

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* clean up

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix logging

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add IT TN to CI

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update patch

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Time_IT_TN (#105)

* add time verbalizer

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add time tagger and verba

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add pytest time

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeQL

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix numbers with eight

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* IT TN improvement on tests (#120)

* add missing test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with time tests

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add sentence test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine shortest path for irregular cardinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add single letter exception for roman numerals (#121)

* add single letter exception for roman numerals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci dir

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrote tokenizer

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed the file and replaced it with char in 1.8

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkins file update

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to fix tn bug@ xuesong

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* tn bug

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixeds and updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adjustments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* testing commit

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating etst cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates adapting to graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated cases for SH tests

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added some sentences

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test cases update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* solving rebase issue, repushing changes

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixings according to ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixings according to the ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* notused removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* format issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* formt issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unused files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* remiving unsed files;

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added sentences as test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added senetnces as test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed commentyed out tests

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating dates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* attemps to fix bug

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* inprocess of fixing the bug

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing existing issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated graph_utils, tokenize and classify, and word graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added bacl the ppostprocessor far creation

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated NEMO_NOT_ALPHA as a new variable

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* far files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* combiedn into measure

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing and combined to meaasure

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing, not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to solve the space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving sh issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving sh test issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding anands updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* data updated for measure and whitelist

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing fraction and math part

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing preprocessor, updating measure, adding shitelist cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing processor, modification for sp test, shitelist and word

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating zh date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* realized itn being cvommented out, adding back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* trying to run zh tn separately because it takes long time to run

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* modification to ru zh tn separately

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* independent zh tnitn tests for more time

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding lines to save far file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates for reducing testing time

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* for ounct graph

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing used graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* format and removing used comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing this one, not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* remove unused commentss�

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Delete tools/text_processing_deployment/zh directory

Removing far files.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* updates according to the github comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* punct grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_cases_cardinal.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Dockerfile

Copied from main branch ( which included Anand's updates)

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update launch.sh

Found differences in the file. Fixing it back.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_word.py

Saw word ITN being commented out. Adding it back.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update money.py

Found cardinal grammar not accepting suffix. Fixed it.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update Jenkinsfile

Removed duplicated zh test from line 230s

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update utils.py

Addressing bug raised in bug in graph_utils.py of zh ITN and decimal tagger of ar TN #162.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update graph_utils.py

Addressing bug in graph_utils.py of zh ITN and decimal tagger of ar TN #162.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update measure.py

Fixing code style, removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Fixing code style, removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update measure.py

Removing unused import.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update post_processing.py

Removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update post_processing.py

Removing unused import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update cardinal.py

Deleting unused graph

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Removing import pynini

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

removing pynini import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update verbalize.py

removing pynutil import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update post_processing.py

removing punct graph imported

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_sparrowhawk_normalization.sh

Update on test issue for Docker file locations

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_ordinal.py

Fixing style. 

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Delete nemo_text_processing/text_normalization/zh/taggers/math_symbol.py

Removing because it's not one of the semiotic classes.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Delete nemo_text_processing/text_normalization/zh/verbalizers/math_symbol.py

Removing because it's not one of the semiotic classes.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Jenkinsfile

Updating Jenkins date

Sign…
BuyuanCui added a commit that referenced this pull request Jul 12, 2024
* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add country codes from hu (#77)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix electronic case for username (#75)

* fix electronic username w/o .

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix ar test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci dirs, enable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.8 release (#79)

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Codeswitched ES/EN ITN  (#78)

* Initial commit for ES-EN codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable export for es_en codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add whitelist, update weights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests for en_es, zone tagged separately in es

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix path to test data for sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile - enable ES/EN tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add __init__.py files

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix issues with failed docker build - due to archiving of debian and issues with re2

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove unused imports and variables

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update date

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable NBSP in sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update copyrights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache path in for ES/EN CI/CD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* minor normalize.py edit for usability (#84)

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* documentation edits for grammar/clarity

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* added --output_field flag for command line interface

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish ITN (#40)

* force two digits for month

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put it in a function, because I reject the garbage pre-commit.ci came up with

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrap some more pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add graph pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete junk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser (copy from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add right tokens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks, more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to ITN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moved to tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nothing actually fixed here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* now most tests pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* electronic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bare fractions is a bit of an overreach

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just inverting the TN whitelist tagger will not work/be useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from English

Signed-off-by: Jim O'Regan <joregan@kth.se>

* overwrite with version from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swap tsv sides

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add optional_era variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make deterministic default, like most of the others

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also add lowercase versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replacing NEMO_SPACE does not work either

Signed-off-by: Jim O'Regan <joregan@kth.se>

* increasing weight... did not work last time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaking test cases, in case it was a sentence splitting issue. It was not

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put the full stops back

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add filler words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try splitting this out to see if it makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* aha, this part should be non-deterministic only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single line only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "increasing weight... did not work last time"

This reverts commit 39b020b50db745dfd6b281c8cbca45a033926996.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* disabling ITN here makes TN work again(?)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "disabling ITN here makes TN work again(?)"

This reverts commit be49d7d5c687876e51c2e9ce1cf1e01491df280f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing the variable name fixes norm tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing test tooling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a piece for area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add country codes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend any_read_digit for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country/area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* first attempt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove country codes for the time being, makes things ambiguous

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove trailing whitespace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update __init__.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transform of TN tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transformation of TN decimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* slight changes to date

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* include space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen was not that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* respect my own copyright xD

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading to constructor; had weirdness in this file, probably due to module-level python-suckage

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading, this has been an oddity before

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try changing this year declaration

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add year + era

Signed-off-by: Jim O'Regan <joregan@kth.se>

* eliminate more module-level data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "eliminate more module-level data loading"

This reverts commit 6a26e5d927817e1308e818758196924441ff7b3a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expose variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extra param for itn mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix parens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adapt/extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix dict init/change keys to something useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* initial stab at prefixed numbers

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert kl. if absent

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* the relative prefixed times

Signed-off-by: Jim O'Regan <joregan@kth.se>

* + comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* enable time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space in both directions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix hours to

Signed-off-by: Jim O'Regan <joregan@kth.se>

* split by before/after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix if

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. 9

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* keep only get_abs_path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add trimmed file

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* two abs_paths... could be fun

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minutes/seconds

Signed-off-by: Jim O'Regan <joregan@kth.se>

* suffix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* one optional

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. or one of suffix/zone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* already disambiguated

Signed-off-by: Jim O'Regan <joregan@kth.se>

* closure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do not insert kl.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix spelling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Delete measure.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete money.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused test pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add/update __init__

Signed-off-by: Jim O'Regan <joregan@kth.se>

* blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix lang

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove year, conflicts with cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space before, not after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cardinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* spurious deletion

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* re-enable SV TN; enable SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "re-enable SV TN; enable SV ITN"

This reverts commit 3ce4dfde1f70a89afc274284f6e4c737b3fac95b.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix singulras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use cdrewrite

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just EOS/BOS

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* omit en/ett, because they are also articles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* strip spaces from decimal part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partial fix, not what I wanted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* en/ett cannot work in itn case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* be more deliberate in graph construction

Signed-off-by: Jim O'Regan <joregan@kth.se>

* accept both

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* +2 tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* (try to) accept singular quantities for plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* retry

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* oops

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* arcmap

Signed-off-by: Jim O'Regan <joregan@kth.se>

* version without ones

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of this, this is where it goes wrong

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* multiple states from both ones, try removing and readding

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove ones, see if that fixes at least the bare quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, dunno why it still breaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move definition

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* local declaration, seems to not be working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* match verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix last two failing tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing tests for telephone and word

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of convert_space, tests fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "put convert_spaces back, change test file; pytest fails"

This reverts commit a7bb7489137b8026aab02aff64df39e874630043.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails, take 2

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deliberately remove spaces rather than have a non-determinism that comes out differently in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try converting the non-breaking spaces in the shell script

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrong place

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update date.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update time.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix comment

Signed-off-by: Jim O’Regan <joregan@kth.se>

* trim comments

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove commented line

Signed-off-by: Jim O’Regan <joregan@kth.se>

* en halv

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update test_sparrowhawk_inverse_text_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Italian_TN (#67)

* add TN italian

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix init

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix LOCATION

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* modify graph_utils

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* correct decimals

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix measure

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh itn (#74)

* Add ZH ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix copyrights and code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove invalid tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve CodeQL issues

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix missing 'zh' option for ITN and correct comment

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update __init__.py

Change to zh instead of en for the imports.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for decimal test data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* update for langauge import

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for Chinese punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* a new class for whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* PYNINI_AVAILABLE = False

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to file import format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* caught duplicates, removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates, arranges for CHInese Yuan updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates accordingly to the comments from last PR. Recreated some of the files due to format issues

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the hours_to and minute_to files used for back counting. ALso removed am and pm suffix files according to the last PR. Recreated some of them for format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* re-added this file to avoid data file import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated gramamr according to last PR. Removed the acceptance of 千

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated according to last PR. Removed comma after decimal points

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for Fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for money and updated according to last PR. Plus process of 元

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar. updates due to the updates in cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR comments. removing am and pm and allowing simple mandarin expression

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* arrangements

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added whitelist grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* word grammar for non-classified items

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated cardinal, decimal, time, itn data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to last PR

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to the updates for cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for more Mandarin punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR. removing am pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* adjustment on the weight

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the targger updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the time tagger

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to changes in tagger on am and pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* verbalizer for fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for mandarin grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* kept this file because using English utils results in data namin error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* merge conflict

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed import os

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed variables

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue, reccreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue recreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed codeing style/format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed coding style and format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicated graph for 毛

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unnecessary comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* unnecessary comment removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test file updated for more cases

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated with a comment explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated the file explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added Mandarin as zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing for dplication

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed unused NEMO objects

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix test file failures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix file failtures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failture

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adap to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adapt to grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adopt to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixing pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed // for zhtn/itn cache

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Update inverse_normalize.py

Added zh as a selection to pass Jenkins checks.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: Alex Cui <alcui@nvidia.com>
Co-authored-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated pynini_export.py file to create far files (#88)

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* readd Swedish (#87)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn 0712 (#89)

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and fixings according to document on natonal gideline

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fraction updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* money updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* punctuation grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* time gramamr updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* tokenizaer updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates on certificate

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* data updated and added due to updates and chanegs to the existing grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* cardinal updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* date grammar changed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test data added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test python file edits

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for tn1.0 and previous tn grammar from contribution

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test cases updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fixed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* dates updated for init files

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated the date for zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added back the itn tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added back measure and math from previou TN

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated for tests reruns

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated weights

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn char (#95)

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* code stle

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* audio-based TN fix for empty pred_text/text (#92)

* fix for empty pred_text

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add unittests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix pytest

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* pip 1.2.0

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* French tn (#91)

* add tests for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add fr tn for cardinals, decimals, fractions and ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* delete it far files from tools

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add languages to run_evaluate

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* remove ambiguous spacing

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* enable sh testing for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile cache date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix test for ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update tn cache for fr

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* resolve codeql issues

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist_tech.tsv (#96)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zhitn 0727 (#93)

* updates on itn grammar to pass sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats for sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates fro sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fix

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for coding style and sparrowhawk test

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated classes for tests on whitelist and word grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for tests on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_word.py

Removed unused import.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_word.py

Removed imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Removing imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Jenkinsfile

changed zh cache to 07-27-23 as it is the latest update.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es tn romans fix (#98)

* fix es tn roman exceptions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update eval script for ITN

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* codeql fix

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Change docker image (#102)

Change docker image to one including sparrowhawk

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Print warning instead exception (#97)

* raise text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* text arg

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* Failed text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* add logger

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* logger

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* NeMo-text-processing

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* info level

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Normalizer.select_verbalizer

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Exception

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restart ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nikolay Karpov <nkarpov@nvidia.com>
Co-authored-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* warning regardless of verbose flag (#107)

* warning

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* self.verbose

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Unpin setuptools (#106)

Signed-off-by: Peter Plantinga <plantinga.peter@proton.me>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed warnings: File is not always closes. (#113)

Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug #111 (ar currencies) (#117)

* fix bug #111 (ar currencies)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci folder

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Logging clean up + IT TN fix (#118)

* fix utils and it TN

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* clean up

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix logging

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add IT TN to CI

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update patch

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Time_IT_TN (#105)

* add time verbalizer

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add time tagger and verba

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add pytest time

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeQL

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix numbers with eight

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* IT TN improvement on tests (#120)

* add missing test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with time tests

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add sentence test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine shortest path for irregular cardinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add single letter exception for roman numerals (#121)

* add single letter exception for roman numerals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci dir

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrote tokenizer

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed the file and replaced it with char in 1.8

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkins file update

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to fix tn bug@ xuesong

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* tn bug

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixeds and updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adjustments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* testing commit

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating etst cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates adapting to graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated cases for SH tests

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added some sentences

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test cases update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* solving rebase issue, repushing changes

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixings according to ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixings according to the ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* notused removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* format issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* formt issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unused files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* remiving unsed files;

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added sentences as test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added senetnces as test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed commentyed out tests

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating dates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* attemps to fix bug

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* inprocess of fixing the bug

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing existing issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated graph_utils, tokenize and classify, and word graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added bacl the ppostprocessor far creation

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated NEMO_NOT_ALPHA as a new variable

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* far files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* combiedn into measure

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing and combined to meaasure

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing, not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to solve the space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving sh issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving sh test issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding anands updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* data updated for measure and whitelist

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing fraction and math part

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing preprocessor, updating measure, adding shitelist cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing processor, modification for sp test, shitelist and word

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating zh date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* realized itn being cvommented out, adding back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* trying to run zh tn separately because it takes long time to run

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* modification to ru zh tn separately

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* independent zh tnitn tests for more time

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding lines to save far file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates for reducing testing time

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* for ounct graph

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing used graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* format and removing used comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing this one, not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* remove unused commentss�

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Delete tools/text_processing_deployment/zh directory

Removing far files.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* updates according to the github comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* punct grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_cases_cardinal.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Dockerfile

Copied from main branch ( which included Anand's updates)

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update launch.sh

Found differences in the file. Fixing it back.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_word.py

Saw word ITN being commented out. Adding it back.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update money.py

Found cardinal grammar not accepting suffix. Fixed it.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update Jenkinsfile

Removed duplicated zh test from line 230s

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update utils.py

Addressing bug raised in bug in graph_utils.py of zh ITN and decimal tagger of ar TN #162.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update graph_utils.py

Addressing bug in graph_utils.py of zh ITN and decimal tagger of ar TN #162.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update measure.py

Fixing code style, removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Fixing code style, removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update measure.py

Removing unused import.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update post_processing.py

Removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update post_processing.py

Removing unused import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update cardinal.py

Deleting unused graph

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Removing import pynini

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

removing pynini import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update verbalize.py

removing pynutil import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update post_processing.py

removing punct graph imported

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_sparrowhawk_normalization.sh

Update on test issue for Docker file locations

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_ordinal.py

Fixing style.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Delete nemo_text_processing/text_normalization/zh/taggers/math_symbol.py

Removing because it's not one of the semiotic classes.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Delete nemo_text_processing/text_normalization/zh/verbalizers/math_symbol.py

Removing because it's not one of the semiotic classes.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Jenkinsfile

Updating Jenkins date

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>
Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Peter Plantinga <plantinga.peter@proton.me>
Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Co-authored-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: Enno Hermann <Eginhard@users.noreply.github.com>
Co-authored-by: Vitaly Lavrukhin <vitaly.lavrukhin@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Enas Albasiri <71229149+ealbasiri@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: lleaver <137942999+lleaver@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: Giacomo Leone Maria Cavallini <72698188+GiacomoLeoneMaria@users.noreply.github.com>
Co-authored-by: Alex Cui <alcui@nvidia.com>
Co-authored-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: Nikolay Karpov <nkarpov@nvidia.com>
Co-authored-by: Peter Plantinga <plantinga.peter@proton.me>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>
BuyuanCui added a commit that referenced this pull request Jul 25, 2024
* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add country codes from hu (#77)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix electronic case for username (#75)

* fix electronic username w/o .

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix ar test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci dirs, enable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.8 release (#79)

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Codeswitched ES/EN ITN  (#78)

* Initial commit for ES-EN codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable export for es_en codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add whitelist, update weights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests for en_es, zone tagged separately in es

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix path to test data for sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile - enable ES/EN tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add __init__.py files

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix issues with failed docker build - due to archiving of debian and issues with re2

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove unused imports and variables

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update date

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable NBSP in sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update copyrights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache path in for ES/EN CI/CD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* minor normalize.py edit for usability (#84)

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* documentation edits for grammar/clarity

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* added --output_field flag for command line interface

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish ITN (#40)

* force two digits for month

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put it in a function, because I reject the garbage pre-commit.ci came up with

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrap some more pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add graph pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete junk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser (copy from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add right tokens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks, more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to ITN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moved to tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nothing actually fixed here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* now most tests pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* electronic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bare fractions is a bit of an overreach

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just inverting the TN whitelist tagger will not work/be useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from English

Signed-off-by: Jim O'Regan <joregan@kth.se>

* overwrite with version from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swap tsv sides

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add optional_era variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make deterministic default, like most of the others

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also add lowercase versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replacing NEMO_SPACE does not work either

Signed-off-by: Jim O'Regan <joregan@kth.se>

* increasing weight... did not work last time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaking test cases, in case it was a sentence splitting issue. It was not

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put the full stops back

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add filler words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try splitting this out to see if it makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* aha, this part should be non-deterministic only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single line only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "increasing weight... did not work last time"

This reverts commit 39b020b50db745dfd6b281c8cbca45a033926996.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* disabling ITN here makes TN work again(?)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "disabling ITN here makes TN work again(?)"

This reverts commit be49d7d5c687876e51c2e9ce1cf1e01491df280f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing the variable name fixes norm tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing test tooling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a piece for area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add country codes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend any_read_digit for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country/area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* first attempt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove country codes for the time being, makes things ambiguous

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove trailing whitespace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update __init__.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transform of TN tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transformation of TN decimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* slight changes to date

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* include space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen was not that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* respect my own copyright xD

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading to constructor; had weirdness in this file, probably due to module-level python-suckage

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading, this has been an oddity before

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try changing this year declaration

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add year + era

Signed-off-by: Jim O'Regan <joregan@kth.se>

* eliminate more module-level data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "eliminate more module-level data loading"

This reverts commit 6a26e5d927817e1308e818758196924441ff7b3a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expose variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extra param for itn mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix parens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adapt/extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix dict init/change keys to something useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* initial stab at prefixed numbers

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert kl. if absent

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* the relative prefixed times

Signed-off-by: Jim O'Regan <joregan@kth.se>

* + comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* enable time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space in both directions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix hours to

Signed-off-by: Jim O'Regan <joregan@kth.se>

* split by before/after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix if

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. 9

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* keep only get_abs_path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add trimmed file

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* two abs_paths... could be fun

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minutes/seconds

Signed-off-by: Jim O'Regan <joregan@kth.se>

* suffix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* one optional

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. or one of suffix/zone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* already disambiguated

Signed-off-by: Jim O'Regan <joregan@kth.se>

* closure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do not insert kl.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix spelling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Delete measure.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete money.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused test pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add/update __init__

Signed-off-by: Jim O'Regan <joregan@kth.se>

* blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix lang

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove year, conflicts with cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space before, not after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cardinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* spurious deletion

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* re-enable SV TN; enable SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "re-enable SV TN; enable SV ITN"

This reverts commit 3ce4dfde1f70a89afc274284f6e4c737b3fac95b.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix singulras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use cdrewrite

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just EOS/BOS

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* omit en/ett, because they are also articles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* strip spaces from decimal part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partial fix, not what I wanted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* en/ett cannot work in itn case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* be more deliberate in graph construction

Signed-off-by: Jim O'Regan <joregan@kth.se>

* accept both

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* +2 tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* (try to) accept singular quantities for plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* retry

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* oops

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* arcmap

Signed-off-by: Jim O'Regan <joregan@kth.se>

* version without ones

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of this, this is where it goes wrong

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* multiple states from both ones, try removing and readding

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove ones, see if that fixes at least the bare quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, dunno why it still breaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move definition

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* local declaration, seems to not be working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* match verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix last two failing tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing tests for telephone and word

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of convert_space, tests fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "put convert_spaces back, change test file; pytest fails"

This reverts commit a7bb7489137b8026aab02aff64df39e874630043.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails, take 2

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deliberately remove spaces rather than have a non-determinism that comes out differently in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try converting the non-breaking spaces in the shell script

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrong place

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update date.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update time.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix comment

Signed-off-by: Jim O’Regan <joregan@kth.se>

* trim comments

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove commented line

Signed-off-by: Jim O’Regan <joregan@kth.se>

* en halv

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update test_sparrowhawk_inverse_text_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Italian_TN (#67)

* add TN italian

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix init

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix LOCATION

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* modify graph_utils

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* correct decimals

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix measure

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh itn (#74)

* Add ZH ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix copyrights and code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove invalid tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve CodeQL issues

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix missing 'zh' option for ITN and correct comment

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update __init__.py

Change to zh instead of en for the imports.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for decimal test data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* update for langauge import

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for Chinese punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* a new class for whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* PYNINI_AVAILABLE = False

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to file import format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* caught duplicates, removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates, arranges for CHInese Yuan updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates accordingly to the comments from last PR. Recreated some of the files due to format issues

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the hours_to and minute_to files used for back counting. ALso removed am and pm suffix files according to the last PR. Recreated some of them for format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* re-added this file to avoid data file import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated gramamr according to last PR. Removed the acceptance of 千

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated according to last PR. Removed comma after decimal points

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for Fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for money and updated according to last PR. Plus process of 元

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar. updates due to the updates in cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR comments. removing am and pm and allowing simple mandarin expression

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* arrangements

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added whitelist grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* word grammar for non-classified items

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated cardinal, decimal, time, itn data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to last PR

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to the updates for cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for more Mandarin punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR. removing am pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* adjustment on the weight

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the targger updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the time tagger

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to changes in tagger on am and pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* verbalizer for fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for mandarin grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* kept this file because using English utils results in data namin error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* merge conflict

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed import os

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed variables

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue, reccreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue recreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed codeing style/format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed coding style and format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicated graph for 毛

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unnecessary comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* unnecessary comment removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test file updated for more cases

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated with a comment explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated the file explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added Mandarin as zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing for dplication

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed unused NEMO objects

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix test file failures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix file failtures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failture

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adap to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adapt to grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adopt to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixing pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed // for zhtn/itn cache

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Update inverse_normalize.py

Added zh as a selection to pass Jenkins checks.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: Alex Cui <alcui@nvidia.com>
Co-authored-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated pynini_export.py file to create far files (#88)

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* readd Swedish (#87)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn 0712 (#89)

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and fixings according to document on natonal gideline

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fraction updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* money updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* punctuation grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* time gramamr updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* tokenizaer updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates on certificate

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* data updated and added due to updates and chanegs to the existing grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* cardinal updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* date grammar changed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test data added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test python file edits

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for tn1.0 and previous tn grammar from contribution

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test cases updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fixed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* dates updated for init files

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated the date for zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added back the itn tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added back measure and math from previou TN

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated for tests reruns

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated weights

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn char (#95)

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* code stle

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* audio-based TN fix for empty pred_text/text (#92)

* fix for empty pred_text

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add unittests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix pytest

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* pip 1.2.0

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* French tn (#91)

* add tests for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add fr tn for cardinals, decimals, fractions and ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* delete it far files from tools

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add languages to run_evaluate

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* remove ambiguous spacing

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* enable sh testing for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile cache date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix test for ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update tn cache for fr

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* resolve codeql issues

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist_tech.tsv (#96)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zhitn 0727 (#93)

* updates on itn grammar to pass sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats for sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates fro sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fix

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for coding style and sparrowhawk test

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated classes for tests on whitelist and word grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for tests on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_word.py

Removed unused import.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_word.py

Removed imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Removing imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Jenkinsfile

changed zh cache to 07-27-23 as it is the latest update.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es tn romans fix (#98)

* fix es tn roman exceptions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update eval script for ITN

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* codeql fix

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Change docker image (#102)

Change docker image to one including sparrowhawk

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Print warning instead exception (#97)

* raise text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* text arg

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* Failed text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* add logger

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* logger

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* NeMo-text-processing

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* info level

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Normalizer.select_verbalizer

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Exception

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restart ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nikolay Karpov <nkarpov@nvidia.com>
Co-authored-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* warning regardless of verbose flag (#107)

* warning

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* self.verbose

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Unpin setuptools (#106)

Signed-off-by: Peter Plantinga <plantinga.peter@proton.me>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed warnings: File is not always closes. (#113)

Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug #111 (ar currencies) (#117)

* fix bug #111 (ar currencies)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci folder

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Logging clean up + IT TN fix (#118)

* fix utils and it TN

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* clean up

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix logging

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add IT TN to CI

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update patch

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Time_IT_TN (#105)

* add time verbalizer

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add time tagger and verba

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add pytest time

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeQL

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix numbers with eight

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* IT TN improvement on tests (#120)

* add missing test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with time tests

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add sentence test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine shortest path for irregular cardinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add single letter exception for roman numerals (#121)

* add single letter exception for roman numerals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci dir

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrote tokenizer

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed the file and replaced it with char in 1.8

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkins file update

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to fix tn bug@ xuesong

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* tn bug

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixeds and updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adjustments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* testing commit

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating etst cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates adapting to graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated cases for SH tests

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added some sentences

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test cases update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* solving rebase issue, repushing changes

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixings according to ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixings according to the ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* notused removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* format issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* formt issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unused files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* remiving unsed files;

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added sentences as test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added senetnces as test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed commentyed out tests

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating dates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* attemps to fix bug

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* inprocess of fixing the bug

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing existing issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated graph_utils, tokenize and classify, and word graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added bacl the ppostprocessor far creation

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated NEMO_NOT_ALPHA as a new variable

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* far files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* combiedn into measure

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing and combined to meaasure

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing, not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to solve the space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving sh issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving sh test issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding anands updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* data updated for measure and whitelist

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing fraction and math part

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing preprocessor, updating measure, adding shitelist cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing processor, modification for sp test, shitelist and word

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating zh date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* realized itn being cvommented out, adding back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* trying to run zh tn separately because it takes long time to run

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* modification to ru zh tn separately

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* independent zh tnitn tests for more time

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding lines to save far file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates for reducing testing time

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* for ounct graph

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing used graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* format and removing used comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing this one, not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* remove unused commentss�

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Delete tools/text_processing_deployment/zh directory

Removing far files.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* updates according to the github comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* punct grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_cases_cardinal.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Dockerfile

Copied from main branch ( which included Anand's updates)

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update launch.sh

Found differences in the file. Fixing it back.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_word.py

Saw word ITN being commented out. Adding it back.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update money.py

Found cardinal grammar not accepting suffix. Fixed it.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update Jenkinsfile

Removed duplicated zh test from line 230s

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update utils.py

Addressing bug raised in bug in graph_utils.py of zh ITN and decimal tagger of ar TN #162.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update graph_utils.py

Addressing bug in graph_utils.py of zh ITN and decimal tagger of ar TN #162.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update measure.py

Fixing code style, removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Fixing code style, removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update measure.py

Removing unused import.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update post_processing.py

Removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update post_processing.py

Removing unused import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update cardinal.py

Deleting unused graph

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Removing import pynini

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

removing pynini import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update verbalize.py

removing pynutil import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update post_processing.py

removing punct graph imported

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_sparrowhawk_normalization.sh

Update on test issue for Docker file locations

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_ordinal.py

Fixing style.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Delete nemo_text_processing/text_normalization/zh/taggers/math_symbol.py

Removing because it's not one of the semiotic classes.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Delete nemo_text_processing/text_normalization/zh/verbalizers/math_symbol.py

Removing because it's not one of the semiotic classes.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Jenkinsfile

Updating Jenkins date

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>
Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Peter Plantinga <plantinga.peter@proton.me>
Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Co-authored-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: Enno Hermann <Eginhard@users.noreply.github.com>
Co-authored-by: Vitaly Lavrukhin <vitaly.lavrukhin@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Enas Albasiri <71229149+ealbasiri@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: lleaver <137942999+lleaver@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: Giacomo Leone Maria Cavallini <72698188+GiacomoLeoneMaria@users.noreply.github.com>
Co-authored-by: Alex Cui <alcui@nvidia.com>
Co-authored-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: Nikolay Karpov <nkarpov@nvidia.com>
Co-authored-by: Peter Plantinga <plantinga.peter@proton.me>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>
tbartley94 added a commit that referenced this pull request Aug 16, 2024
* IT TN improvement on tests (#120)

* add missing test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with time tests

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add sentence test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine shortest path for irregular cardinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add single letter exception for roman numerals (#121)

* add single letter exception for roman numerals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci dir

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix broken path for nondet whitelist (#124)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Increase weights for serial (en TN) (#128)

* Increase weights for serial (en TN)

Resolves https://github.com/NVIDIA/NeMo-text-processing/issues/126

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Add tests for fix

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Update Jenkinsfile cache path

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Update Jenkinsfile. Fix cache folder

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measures file for FR TN (#131)

* add measures file

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* update whitelist data

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* add fr tn tests

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Sh jenkins (#127)

* Add SH tests to Jenkins

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkins tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add CI/CD tests for sparrowhawk

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* docker build only if in test mode

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix missing variable

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix comments and remove arguments not required

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix commands not executing

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Missing arguments

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Missing quotes

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix incorrect path for tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Incorrect paths of tests and shunit2

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix issues with paths as arguments to shunit

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Undo path change

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix intentional fail test

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* revert redundant check for cased option

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix default path in export_grammars.sh

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add interactive option

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add SH tests for cased EN ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update isort - fix precommit (#138)

* update isort version

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update isort version

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused imports

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Armenian itn (#136)

* Added Armenian ITN

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added Armenian ITN

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added Armenian ITN

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added context for tests and fixed CodeQL errors

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Revert "Added context for tests and fixed CodeQL errors"

This reverts commit 2c804d941963c0be21d3aad07e6cd13568ab747b.

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added context to some test files and fixed CodeQL errors

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* deleted unnecessary data

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* translated a few measurements to Armenian

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* adjusted some things for better readability and maintainer support

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed one test case and some issues

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>
Co-authored-by: David Sargsyan <d.sargsyan@ispras.ru>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix CI (#142)

* fix whitelist deployment

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* clean up

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comment out tests to recreate grammars

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* shorten test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* cased for TN

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* revert debug changes

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix args default

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* try parallel

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* debug parallel

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* rerun

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* rerun

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix sh tests for local SH launcher

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* enable all ci tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* enable all ci tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Armenian TN (#137)

* merged with main branch and fixed conflicts

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* fixing conflicts

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* fixing some more conflicts

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* fixed a minor issue

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* deleted unused imports

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix: add "hy" language option for armenian

Signed-off-by: Ara Yeroyan <60027241+Ara-Yeroyan@users.noreply.github.com>

* added optional space for measurements after cardinals/decimals

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* added Armenian dot

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>
Signed-off-by: Ara Yeroyan <60027241+Ara-Yeroyan@users.noreply.github.com>
Signed-off-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Co-authored-by: David Sargsyan <d.sargsyan@ispras.ru>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Ara Yeroyan <60027241+Ara-Yeroyan@users.noreply.github.com>
Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Marathi ITN (#134)

* Added Marathi ITN

Signed-off-by: Chinmay Patil <chinmaypatil2000@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adding jenkins test

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

---------

Signed-off-by: Chinmay Patil <chinmaypatil2000@gmail.com>
Signed-off-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Signed-off-by: Travis Bartley <tbartley@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Co-authored-by: Travis Bartley <tbartley@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkins fix (#150)

* jenkins fix

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* removing armenian to troubleshoot jenkins

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* removing armenian to troubleshoot jenkins

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* missing _init_ for python

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* mislabled cache

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

---------

Signed-off-by: Travis Bartley <tbartley@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* r0.3.0 release (#151)

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix text=line[text] to text=line[text_field] (#153)

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* use real string on docstring (#157)

Signed-off-by: Kevin Sanders <kevin.sanders@dialpad.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Sh postprocess (#147)

* Add support for postprocessor far in sparrowhawk

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Choose between having a post processor or not

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update run_evaluate script for cased itn (#164)

* update run_evaluate script for cased itn

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* remove unused function from ar tn decimals (#165)

* remove unused function from ar tn decimals

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* ZH sentence-level TN (#112)

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add country codes from hu (#77)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix electronic case for username (#75)

* fix electronic username w/o .

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix ar test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci dirs, enable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.8 release (#79)

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Codeswitched ES/EN ITN  (#78)

* Initial commit for ES-EN codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable export for es_en codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add whitelist, update weights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests for en_es, zone tagged separately in es

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix path to test data for sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile - enable ES/EN tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add __init__.py files

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix issues with failed docker build - due to archiving of debian and issues with re2

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove unused imports and variables

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update date

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable NBSP in sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update copyrights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache path in for ES/EN CI/CD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* minor normalize.py edit for usability (#84)

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* documentation edits for grammar/clarity

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* added --output_field flag for command line interface

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish ITN (#40)

* force two digits for month

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put it in a function, because I reject the garbage pre-commit.ci came up with

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrap some more pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add graph pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete junk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser (copy from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add right tokens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks, more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to ITN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moved to tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nothing actually fixed here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* now most tests pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* electronic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bare fractions is a bit of an overreach

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just inverting the TN whitelist tagger will not work/be useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from English

Signed-off-by: Jim O'Regan <joregan@kth.se>

* overwrite with version from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swap tsv sides

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add optional_era variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make deterministic default, like most of the others

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also add lowercase versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replacing NEMO_SPACE does not work either

Signed-off-by: Jim O'Regan <joregan@kth.se>

* increasing weight... did not work last time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaking test cases, in case it was a sentence splitting issue. It was not

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put the full stops back

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add filler words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try splitting this out to see if it makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* aha, this part should be non-deterministic only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single line only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "increasing weight... did not work last time"

This reverts commit 39b020b50db745dfd6b281c8cbca45a033926996.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* disabling ITN here makes TN work again(?)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "disabling ITN here makes TN work again(?)"

This reverts commit be49d7d5c687876e51c2e9ce1cf1e01491df280f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing the variable name fixes norm tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing test tooling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a piece for area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add country codes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend any_read_digit for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country/area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* first attempt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove country codes for the time being, makes things ambiguous

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove trailing whitespace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update __init__.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transform of TN tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transformation of TN decimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* slight changes to date

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* include space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen was not that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* respect my own copyright xD

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading to constructor; had weirdness in this file, probably due to module-level python-suckage

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading, this has been an oddity before

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try changing this year declaration

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add year + era

Signed-off-by: Jim O'Regan <joregan@kth.se>

* eliminate more module-level data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "eliminate more module-level data loading"

This reverts commit 6a26e5d927817e1308e818758196924441ff7b3a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expose variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extra param for itn mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix parens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adapt/extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix dict init/change keys to something useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* initial stab at prefixed numbers

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert kl. if absent

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* the relative prefixed times

Signed-off-by: Jim O'Regan <joregan@kth.se>

* + comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* enable time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space in both directions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix hours to

Signed-off-by: Jim O'Regan <joregan@kth.se>

* split by before/after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix if

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. 9

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* keep only get_abs_path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add trimmed file

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* two abs_paths... could be fun

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minutes/seconds

Signed-off-by: Jim O'Regan <joregan@kth.se>

* suffix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* one optional

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. or one of suffix/zone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* already disambiguated

Signed-off-by: Jim O'Regan <joregan@kth.se>

* closure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do not insert kl.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix spelling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Delete measure.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete money.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused test pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add/update __init__

Signed-off-by: Jim O'Regan <joregan@kth.se>

* blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix lang

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove year, conflicts with cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space before, not after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cardinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* spurious deletion

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* re-enable SV TN; enable SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "re-enable SV TN; enable SV ITN"

This reverts commit 3ce4dfde1f70a89afc274284f6e4c737b3fac95b.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix singulras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use cdrewrite

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just EOS/BOS

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* omit en/ett, because they are also articles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* strip spaces from decimal part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partial fix, not what I wanted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* en/ett cannot work in itn case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* be more deliberate in graph construction

Signed-off-by: Jim O'Regan <joregan@kth.se>

* accept both

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* +2 tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* (try to) accept singular quantities for plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* retry

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* oops

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* arcmap

Signed-off-by: Jim O'Regan <joregan@kth.se>

* version without ones

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of this, this is where it goes wrong

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* multiple states from both ones, try removing and readding

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove ones, see if that fixes at least the bare quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, dunno why it still breaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move definition

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* local declaration, seems to not be working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* match verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix last two failing tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing tests for telephone and word

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of convert_space, tests fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "put convert_spaces back, change test file; pytest fails"

This reverts commit a7bb7489137b8026aab02aff64df39e874630043.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails, take 2

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deliberately remove spaces rather than have a non-determinism that comes out differently in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try converting the non-breaking spaces in the shell script

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrong place

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update date.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update time.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix comment

Signed-off-by: Jim O’Regan <joregan@kth.se>

* trim comments

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove commented line

Signed-off-by: Jim O’Regan <joregan@kth.se>

* en halv

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update test_sparrowhawk_inverse_text_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Italian_TN (#67)

* add TN italian

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix init

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix LOCATION

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* modify graph_utils

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* correct decimals

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix measure

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh itn (#74)

* Add ZH ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix copyrights and code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove invalid tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve CodeQL issues

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix missing 'zh' option for ITN and correct comment

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update __init__.py

Change to zh instead of en for the imports.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for decimal test data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* update for langauge import

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for Chinese punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* a new class for whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* PYNINI_AVAILABLE = False

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to file import format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* caught duplicates, removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates, arranges for CHInese Yuan updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates accordingly to the comments from last PR. Recreated some of the files due to format issues

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the hours_to and minute_to files used for back counting. ALso removed am and pm suffix files according to the last PR. Recreated some of them for format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* re-added this file to avoid data file import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated gramamr according to last PR. Removed the acceptance of 千

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated according to last PR. Removed comma after decimal points

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for Fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for money and updated according to last PR. Plus process of 元

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar. updates due to the updates in cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR comments. removing am and pm and allowing simple mandarin expression

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* arrangements

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added whitelist grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* word grammar for non-classified items

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated cardinal, decimal, time, itn data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to last PR

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to the updates for cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for more Mandarin punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR. removing am pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* adjustment on the weight

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the targger updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the time tagger

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to changes in tagger on am and pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* verbalizer for fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for mandarin grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* kept this file because using English utils results in data namin error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* merge conflict

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed import os

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed variables

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue, reccreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue recreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed codeing style/format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed coding style and format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicated graph for 毛

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unnecessary comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* unnecessary comment removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test file updated for more cases

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated with a comment explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated the file explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added Mandarin as zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing for dplication

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed unused NEMO objects

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix test file failures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix file failtures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failture

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adap to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adapt to grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adopt to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixing pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed // for zhtn/itn cache

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Update inverse_normalize.py

Added zh as a selection to pass Jenkins checks.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: Alex Cui <alcui@nvidia.com>
Co-authored-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated pynini_export.py file to create far files (#88)

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* readd Swedish (#87)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn 0712 (#89)

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and fixings according to document on natonal gideline

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fraction updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* money updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* punctuation grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* time gramamr updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* tokenizaer updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates on certificate

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* data updated and added due to updates and chanegs to the existing grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* cardinal updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* date grammar changed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test data added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test python file edits

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for tn1.0 and previous tn grammar from contribution

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test cases updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fixed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* dates updated for init files

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated the date for zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added back the itn tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added back measure and math from previou TN

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated for tests reruns

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated weights

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn char (#95)

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* code stle

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* audio-based TN fix for empty pred_text/text (#92)

* fix for empty pred_text

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add unittests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix pytest

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* pip 1.2.0

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* French tn (#91)

* add tests for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add fr tn for cardinals, decimals, fractions and ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* delete it far files from tools

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add languages to run_evaluate

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* remove ambiguous spacing

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* enable sh testing for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile cache date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix test for ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update tn cache for fr

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* resolve codeql issues

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist_tech.tsv (#96)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zhitn 0727 (#93)

* updates on itn grammar to pass sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats for sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates fro sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fix

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for coding style and sparrowhawk test

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated classes for tests on whitelist and word grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for tests on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_word.py

Removed unused import.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_word.py

Removed imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Removing imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Jenkinsfile

changed zh cache to 07-27-23 as it is the latest update.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es tn romans fix (#98)

* fix es tn roman exceptions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update eval script for ITN

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* codeql fix

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Change docker image (#102)

Change docker image to one including sparrowhawk

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Print warning instead exception (#97)

* raise text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* text arg

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* Failed text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* add logger

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* logger

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* NeMo-text-processing

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* info level

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Normalizer.select_verbalizer

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Exception

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restart ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nikolay Karpov <nkarpov@nvidia.com>
Co-authored-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* warning regardless of verbose flag (#107)

* warning

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* self.verbose

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Unpin setuptools (#106)

Signed-off-by: Peter Plantinga <plantinga.peter@proton.me>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed warnings: File is not always closes. (#113)

Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug #111 (ar currencies) (#117)

* fix bug #111 (ar currencies)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci folder

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Logging clean up + IT TN fix (#118)

* fix utils and it TN

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* clean up

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix logging

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add IT TN to CI

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update patch

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Time_IT_TN (#105)

* add time verbalizer

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add time tagger and verba

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add pytest time

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeQL

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix numbers …
BuyuanCui added a commit that referenced this pull request Aug 20, 2024
* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add country codes from hu (#77)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix electronic case for username (#75)

* fix electronic username w/o .

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix ar test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci dirs, enable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.8 release (#79)

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Codeswitched ES/EN ITN  (#78)

* Initial commit for ES-EN codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable export for es_en codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add whitelist, update weights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests for en_es, zone tagged separately in es

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix path to test data for sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile - enable ES/EN tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add __init__.py files

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix issues with failed docker build - due to archiving of debian and issues with re2

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove unused imports and variables

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update date

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable NBSP in sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update copyrights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache path in for ES/EN CI/CD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* minor normalize.py edit for usability (#84)

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* documentation edits for grammar/clarity

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* added --output_field flag for command line interface

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish ITN (#40)

* force two digits for month

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put it in a function, because I reject the garbage pre-commit.ci came up with

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrap some more pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add graph pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete junk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser (copy from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add right tokens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks, more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to ITN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moved to tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nothing actually fixed here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* now most tests pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* electronic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bare fractions is a bit of an overreach

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just inverting the TN whitelist tagger will not work/be useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from English

Signed-off-by: Jim O'Regan <joregan@kth.se>

* overwrite with version from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swap tsv sides

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add optional_era variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make deterministic default, like most of the others

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also add lowercase versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replacing NEMO_SPACE does not work either

Signed-off-by: Jim O'Regan <joregan@kth.se>

* increasing weight... did not work last time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaking test cases, in case it was a sentence splitting issue. It was not

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put the full stops back

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add filler words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try splitting this out to see if it makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* aha, this part should be non-deterministic only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single line only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "increasing weight... did not work last time"

This reverts commit 39b020b50db745dfd6b281c8cbca45a033926996.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* disabling ITN here makes TN work again(?)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "disabling ITN here makes TN work again(?)"

This reverts commit be49d7d5c687876e51c2e9ce1cf1e01491df280f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing the variable name fixes norm tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing test tooling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a piece for area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add country codes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend any_read_digit for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country/area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* first attempt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove country codes for the time being, makes things ambiguous

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove trailing whitespace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update __init__.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transform of TN tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transformation of TN decimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* slight changes to date

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* include space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen was not that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* respect my own copyright xD

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading to constructor; had weirdness in this file, probably due to module-level python-suckage

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading, this has been an oddity before

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try changing this year declaration

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add year + era

Signed-off-by: Jim O'Regan <joregan@kth.se>

* eliminate more module-level data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "eliminate more module-level data loading"

This reverts commit 6a26e5d927817e1308e818758196924441ff7b3a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expose variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extra param for itn mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix parens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adapt/extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix dict init/change keys to something useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* initial stab at prefixed numbers

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert kl. if absent

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* the relative prefixed times

Signed-off-by: Jim O'Regan <joregan@kth.se>

* + comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* enable time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space in both directions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix hours to

Signed-off-by: Jim O'Regan <joregan@kth.se>

* split by before/after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix if

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. 9

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* keep only get_abs_path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add trimmed file

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* two abs_paths... could be fun

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minutes/seconds

Signed-off-by: Jim O'Regan <joregan@kth.se>

* suffix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* one optional

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. or one of suffix/zone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* already disambiguated

Signed-off-by: Jim O'Regan <joregan@kth.se>

* closure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do not insert kl.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix spelling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Delete measure.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete money.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused test pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add/update __init__

Signed-off-by: Jim O'Regan <joregan@kth.se>

* blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix lang

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove year, conflicts with cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space before, not after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cardinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* spurious deletion

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* re-enable SV TN; enable SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "re-enable SV TN; enable SV ITN"

This reverts commit 3ce4dfde1f70a89afc274284f6e4c737b3fac95b.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix singulras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use cdrewrite

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just EOS/BOS

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* omit en/ett, because they are also articles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* strip spaces from decimal part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partial fix, not what I wanted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* en/ett cannot work in itn case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* be more deliberate in graph construction

Signed-off-by: Jim O'Regan <joregan@kth.se>

* accept both

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* +2 tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* (try to) accept singular quantities for plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* retry

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* oops

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* arcmap

Signed-off-by: Jim O'Regan <joregan@kth.se>

* version without ones

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of this, this is where it goes wrong

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* multiple states from both ones, try removing and readding

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove ones, see if that fixes at least the bare quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, dunno why it still breaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move definition

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* local declaration, seems to not be working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* match verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix last two failing tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing tests for telephone and word

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of convert_space, tests fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "put convert_spaces back, change test file; pytest fails"

This reverts commit a7bb7489137b8026aab02aff64df39e874630043.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails, take 2

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deliberately remove spaces rather than have a non-determinism that comes out differently in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try converting the non-breaking spaces in the shell script

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrong place

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update date.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update time.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix comment

Signed-off-by: Jim O’Regan <joregan@kth.se>

* trim comments

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove commented line

Signed-off-by: Jim O’Regan <joregan@kth.se>

* en halv

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update test_sparrowhawk_inverse_text_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Italian_TN (#67)

* add TN italian

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix init

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix LOCATION

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* modify graph_utils

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* correct decimals

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix measure

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh itn (#74)

* Add ZH ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix copyrights and code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove invalid tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve CodeQL issues

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix missing 'zh' option for ITN and correct comment

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update __init__.py

Change to zh instead of en for the imports.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for decimal test data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* update for langauge import

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for Chinese punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* a new class for whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* PYNINI_AVAILABLE = False

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to file import format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* caught duplicates, removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates, arranges for CHInese Yuan updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates accordingly to the comments from last PR. Recreated some of the files due to format issues

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the hours_to and minute_to files used for back counting. ALso removed am and pm suffix files according to the last PR. Recreated some of them for format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* re-added this file to avoid data file import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated gramamr according to last PR. Removed the acceptance of 千

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated according to last PR. Removed comma after decimal points

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for Fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for money and updated according to last PR. Plus process of 元

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar. updates due to the updates in cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR comments. removing am and pm and allowing simple mandarin expression

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* arrangements

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added whitelist grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* word grammar for non-classified items

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated cardinal, decimal, time, itn data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to last PR

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to the updates for cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for more Mandarin punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR. removing am pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* adjustment on the weight

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the targger updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the time tagger

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to changes in tagger on am and pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* verbalizer for fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for mandarin grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* kept this file because using English utils results in data namin error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* merge conflict

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed import os

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed variables

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue, reccreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue recreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed codeing style/format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed coding style and format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicated graph for 毛

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unnecessary comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* unnecessary comment removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test file updated for more cases

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated with a comment explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated the file explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added Mandarin as zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing for dplication

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed unused NEMO objects

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix test file failures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix file failtures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failture

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adap to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adapt to grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adopt to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixing pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed // for zhtn/itn cache

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Update inverse_normalize.py

Added zh as a selection to pass Jenkins checks.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: Alex Cui <alcui@nvidia.com>
Co-authored-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated pynini_export.py file to create far files (#88)

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* readd Swedish (#87)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn 0712 (#89)

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and fixings according to document on natonal gideline

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fraction updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* money updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* punctuation grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* time gramamr updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* tokenizaer updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates on certificate

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* data updated and added due to updates and chanegs to the existing grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* cardinal updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* date grammar changed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test data added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test python file edits

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for tn1.0 and previous tn grammar from contribution

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test cases updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fixed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* dates updated for init files

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated the date for zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added back the itn tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added back measure and math from previou TN

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated for tests reruns

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated weights

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn char (#95)

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* code stle

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* audio-based TN fix for empty pred_text/text (#92)

* fix for empty pred_text

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add unittests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix pytest

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* pip 1.2.0

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* French tn (#91)

* add tests for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add fr tn for cardinals, decimals, fractions and ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* delete it far files from tools

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add languages to run_evaluate

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* remove ambiguous spacing

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* enable sh testing for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile cache date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix test for ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update tn cache for fr

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* resolve codeql issues

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist_tech.tsv (#96)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zhitn 0727 (#93)

* updates on itn grammar to pass sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats for sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates fro sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fix

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for coding style and sparrowhawk test

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated classes for tests on whitelist and word grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for tests on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_word.py

Removed unused import.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_word.py

Removed imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Removing imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Jenkinsfile

changed zh cache to 07-27-23 as it is the latest update.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es tn romans fix (#98)

* fix es tn roman exceptions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update eval script for ITN

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* codeql fix

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Change docker image (#102)

Change docker image to one including sparrowhawk

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Print warning instead exception (#97)

* raise text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* text arg

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* Failed text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* add logger

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* logger

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* NeMo-text-processing

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* info level

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Normalizer.select_verbalizer

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Exception

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restart ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nikolay Karpov <nkarpov@nvidia.com>
Co-authored-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* warning regardless of verbose flag (#107)

* warning

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* self.verbose

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Unpin setuptools (#106)

Signed-off-by: Peter Plantinga <plantinga.peter@proton.me>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed warnings: File is not always closes. (#113)

Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug #111 (ar currencies) (#117)

* fix bug #111 (ar currencies)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci folder

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Logging clean up + IT TN fix (#118)

* fix utils and it TN

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* clean up

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix logging

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add IT TN to CI

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update patch

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Time_IT_TN (#105)

* add time verbalizer

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add time tagger and verba

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add pytest time

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeQL

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix numbers with eight

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* IT TN improvement on tests (#120)

* add missing test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with time tests

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add sentence test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine shortest path for irregular cardinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add single letter exception for roman numerals (#121)

* add single letter exception for roman numerals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci dir

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrote tokenizer

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed the file and replaced it with char in 1.8

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkins file update

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to fix tn bug@ xuesong

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* tn bug

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixeds and updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adjustments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* testing commit

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating etst cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates adapting to graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated cases for SH tests

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added some sentences

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test cases update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* solving rebase issue, repushing changes

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixings according to ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixings according to the ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* notused removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* format issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* formt issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unused files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* remiving unsed files;

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added sentences as test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added senetnces as test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed commentyed out tests

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating dates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* attemps to fix bug

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* inprocess of fixing the bug

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing existing issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated graph_utils, tokenize and classify, and word graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added bacl the ppostprocessor far creation

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated NEMO_NOT_ALPHA as a new variable

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* far files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* combiedn into measure

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing and combined to meaasure

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing, not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to solve the space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving sh issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving sh test issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding anands updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* data updated for measure and whitelist

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing fraction and math part

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing preprocessor, updating measure, adding shitelist cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing processor, modification for sp test, shitelist and word

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating zh date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* realized itn being cvommented out, adding back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* trying to run zh tn separately because it takes long time to run

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* modification to ru zh tn separately

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* independent zh tnitn tests for more time

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding lines to save far file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates for reducing testing time

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* for ounct graph

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing used graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* format and removing used comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing this one, not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* remove unused commentss�

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Delete tools/text_processing_deployment/zh directory

Removing far files.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* updates according to the github comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* punct grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_cases_cardinal.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Dockerfile

Copied from main branch ( which included Anand's updates)

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update launch.sh

Found differences in the file. Fixing it back.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_word.py

Saw word ITN being commented out. Adding it back.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update money.py

Found cardinal grammar not accepting suffix. Fixed it.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update Jenkinsfile

Removed duplicated zh test from line 230s

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update utils.py

Addressing bug raised in bug in graph_utils.py of zh ITN and decimal tagger of ar TN #162.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update graph_utils.py

Addressing bug in graph_utils.py of zh ITN and decimal tagger of ar TN #162.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update measure.py

Fixing code style, removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Fixing code style, removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update measure.py

Removing unused import.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update post_processing.py

Removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update post_processing.py

Removing unused import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update cardinal.py

Deleting unused graph

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Removing import pynini

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

removing pynini import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update verbalize.py

removing pynutil import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update post_processing.py

removing punct graph imported

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_sparrowhawk_normalization.sh

Update on test issue for Docker file locations

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_ordinal.py

Fixing style.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Delete nemo_text_processing/text_normalization/zh/taggers/math_symbol.py

Removing because it's not one of the semiotic classes.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Delete nemo_text_processing/text_normalization/zh/verbalizers/math_symbol.py

Removing because it's not one of the semiotic classes.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Jenkinsfile

Updating Jenkins date

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>
Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Peter Plantinga <plantinga.peter@proton.me>
Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Co-authored-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: Enno Hermann <Eginhard@users.noreply.github.com>
Co-authored-by: Vitaly Lavrukhin <vitaly.lavrukhin@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Enas Albasiri <71229149+ealbasiri@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: lleaver <137942999+lleaver@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: Giacomo Leone Maria Cavallini <72698188+GiacomoLeoneMaria@users.noreply.github.com>
Co-authored-by: Alex Cui <alcui@nvidia.com>
Co-authored-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: Nikolay Karpov <nkarpov@nvidia.com>
Co-authored-by: Peter Plantinga <plantinga.peter@proton.me>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>
BuyuanCui added a commit that referenced this pull request Aug 20, 2024
* IT TN improvement on tests (#120)

* add missing test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with time tests

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add sentence test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine shortest path for irregular cardinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add single letter exception for roman numerals (#121)

* add single letter exception for roman numerals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci dir

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix broken path for nondet whitelist (#124)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Increase weights for serial (en TN) (#128)

* Increase weights for serial (en TN)

Resolves https://github.com/NVIDIA/NeMo-text-processing/issues/126

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Add tests for fix

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Update Jenkinsfile cache path

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Update Jenkinsfile. Fix cache folder

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measures file for FR TN (#131)

* add measures file

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* update whitelist data

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* add fr tn tests

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Sh jenkins (#127)

* Add SH tests to Jenkins

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkins tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add CI/CD tests for sparrowhawk

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* docker build only if in test mode

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix missing variable

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix comments and remove arguments not required

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix commands not executing

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Missing arguments

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Missing quotes

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix incorrect path for tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Incorrect paths of tests and shunit2

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix issues with paths as arguments to shunit

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Undo path change

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix intentional fail test

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* revert redundant check for cased option

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix default path in export_grammars.sh

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add interactive option

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add SH tests for cased EN ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update isort - fix precommit (#138)

* update isort version

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update isort version

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused imports

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Armenian itn (#136)

* Added Armenian ITN

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added Armenian ITN

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added Armenian ITN

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added context for tests and fixed CodeQL errors

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Revert "Added context for tests and fixed CodeQL errors"

This reverts commit 2c804d941963c0be21d3aad07e6cd13568ab747b.

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added context to some test files and fixed CodeQL errors

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* deleted unnecessary data

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* translated a few measurements to Armenian

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* adjusted some things for better readability and maintainer support

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed one test case and some issues

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>
Co-authored-by: David Sargsyan <d.sargsyan@ispras.ru>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix CI (#142)

* fix whitelist deployment

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* clean up

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comment out tests to recreate grammars

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* shorten test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* cased for TN

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* revert debug changes

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix args default

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* try parallel

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* debug parallel

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* rerun

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* rerun

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix sh tests for local SH launcher

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* enable all ci tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* enable all ci tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Armenian TN (#137)

* merged with main branch and fixed conflicts

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* fixing conflicts

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* fixing some more conflicts

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* fixed a minor issue

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* deleted unused imports

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix: add "hy" language option for armenian

Signed-off-by: Ara Yeroyan <60027241+Ara-Yeroyan@users.noreply.github.com>

* added optional space for measurements after cardinals/decimals

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* added Armenian dot

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>
Signed-off-by: Ara Yeroyan <60027241+Ara-Yeroyan@users.noreply.github.com>
Signed-off-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Co-authored-by: David Sargsyan <d.sargsyan@ispras.ru>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Ara Yeroyan <60027241+Ara-Yeroyan@users.noreply.github.com>
Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Marathi ITN (#134)

* Added Marathi ITN

Signed-off-by: Chinmay Patil <chinmaypatil2000@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adding jenkins test

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

---------

Signed-off-by: Chinmay Patil <chinmaypatil2000@gmail.com>
Signed-off-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Signed-off-by: Travis Bartley <tbartley@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Co-authored-by: Travis Bartley <tbartley@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkins fix (#150)

* jenkins fix

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* removing armenian to troubleshoot jenkins

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* removing armenian to troubleshoot jenkins

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* missing _init_ for python

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* mislabled cache

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

---------

Signed-off-by: Travis Bartley <tbartley@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* r0.3.0 release (#151)

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix text=line[text] to text=line[text_field] (#153)

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* use real string on docstring (#157)

Signed-off-by: Kevin Sanders <kevin.sanders@dialpad.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Sh postprocess (#147)

* Add support for postprocessor far in sparrowhawk

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Choose between having a post processor or not

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update run_evaluate script for cased itn (#164)

* update run_evaluate script for cased itn

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* remove unused function from ar tn decimals (#165)

* remove unused function from ar tn decimals

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* ZH sentence-level TN (#112)

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add country codes from hu (#77)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix electronic case for username (#75)

* fix electronic username w/o .

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix ar test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci dirs, enable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.8 release (#79)

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Codeswitched ES/EN ITN  (#78)

* Initial commit for ES-EN codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable export for es_en codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add whitelist, update weights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests for en_es, zone tagged separately in es

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix path to test data for sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile - enable ES/EN tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add __init__.py files

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix issues with failed docker build - due to archiving of debian and issues with re2

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove unused imports and variables

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update date

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable NBSP in sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update copyrights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache path in for ES/EN CI/CD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* minor normalize.py edit for usability (#84)

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* documentation edits for grammar/clarity

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* added --output_field flag for command line interface

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish ITN (#40)

* force two digits for month

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put it in a function, because I reject the garbage pre-commit.ci came up with

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrap some more pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add graph pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete junk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser (copy from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add right tokens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks, more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to ITN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moved to tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nothing actually fixed here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* now most tests pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* electronic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bare fractions is a bit of an overreach

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just inverting the TN whitelist tagger will not work/be useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from English

Signed-off-by: Jim O'Regan <joregan@kth.se>

* overwrite with version from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swap tsv sides

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add optional_era variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make deterministic default, like most of the others

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also add lowercase versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replacing NEMO_SPACE does not work either

Signed-off-by: Jim O'Regan <joregan@kth.se>

* increasing weight... did not work last time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaking test cases, in case it was a sentence splitting issue. It was not

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put the full stops back

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add filler words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try splitting this out to see if it makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* aha, this part should be non-deterministic only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single line only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "increasing weight... did not work last time"

This reverts commit 39b020b50db745dfd6b281c8cbca45a033926996.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* disabling ITN here makes TN work again(?)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "disabling ITN here makes TN work again(?)"

This reverts commit be49d7d5c687876e51c2e9ce1cf1e01491df280f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing the variable name fixes norm tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing test tooling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a piece for area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add country codes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend any_read_digit for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country/area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* first attempt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove country codes for the time being, makes things ambiguous

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove trailing whitespace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update __init__.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transform of TN tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transformation of TN decimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* slight changes to date

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* include space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen was not that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* respect my own copyright xD

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading to constructor; had weirdness in this file, probably due to module-level python-suckage

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading, this has been an oddity before

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try changing this year declaration

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add year + era

Signed-off-by: Jim O'Regan <joregan@kth.se>

* eliminate more module-level data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "eliminate more module-level data loading"

This reverts commit 6a26e5d927817e1308e818758196924441ff7b3a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expose variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extra param for itn mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix parens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adapt/extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix dict init/change keys to something useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* initial stab at prefixed numbers

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert kl. if absent

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* the relative prefixed times

Signed-off-by: Jim O'Regan <joregan@kth.se>

* + comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* enable time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space in both directions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix hours to

Signed-off-by: Jim O'Regan <joregan@kth.se>

* split by before/after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix if

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. 9

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* keep only get_abs_path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add trimmed file

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* two abs_paths... could be fun

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minutes/seconds

Signed-off-by: Jim O'Regan <joregan@kth.se>

* suffix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* one optional

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. or one of suffix/zone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* already disambiguated

Signed-off-by: Jim O'Regan <joregan@kth.se>

* closure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do not insert kl.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix spelling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Delete measure.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete money.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused test pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add/update __init__

Signed-off-by: Jim O'Regan <joregan@kth.se>

* blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix lang

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove year, conflicts with cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space before, not after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cardinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* spurious deletion

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* re-enable SV TN; enable SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "re-enable SV TN; enable SV ITN"

This reverts commit 3ce4dfde1f70a89afc274284f6e4c737b3fac95b.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix singulras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use cdrewrite

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just EOS/BOS

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* omit en/ett, because they are also articles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* strip spaces from decimal part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partial fix, not what I wanted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* en/ett cannot work in itn case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* be more deliberate in graph construction

Signed-off-by: Jim O'Regan <joregan@kth.se>

* accept both

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* +2 tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* (try to) accept singular quantities for plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* retry

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* oops

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* arcmap

Signed-off-by: Jim O'Regan <joregan@kth.se>

* version without ones

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of this, this is where it goes wrong

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* multiple states from both ones, try removing and readding

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove ones, see if that fixes at least the bare quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, dunno why it still breaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move definition

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* local declaration, seems to not be working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* match verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix last two failing tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing tests for telephone and word

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of convert_space, tests fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "put convert_spaces back, change test file; pytest fails"

This reverts commit a7bb7489137b8026aab02aff64df39e874630043.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails, take 2

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deliberately remove spaces rather than have a non-determinism that comes out differently in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try converting the non-breaking spaces in the shell script

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrong place

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update date.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update time.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix comment

Signed-off-by: Jim O’Regan <joregan@kth.se>

* trim comments

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove commented line

Signed-off-by: Jim O’Regan <joregan@kth.se>

* en halv

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update test_sparrowhawk_inverse_text_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Italian_TN (#67)

* add TN italian

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix init

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix LOCATION

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* modify graph_utils

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* correct decimals

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix measure

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh itn (#74)

* Add ZH ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix copyrights and code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove invalid tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve CodeQL issues

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix missing 'zh' option for ITN and correct comment

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update __init__.py

Change to zh instead of en for the imports.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for decimal test data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* update for langauge import

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for Chinese punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* a new class for whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* PYNINI_AVAILABLE = False

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to file import format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* caught duplicates, removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates, arranges for CHInese Yuan updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates accordingly to the comments from last PR. Recreated some of the files due to format issues

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the hours_to and minute_to files used for back counting. ALso removed am and pm suffix files according to the last PR. Recreated some of them for format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* re-added this file to avoid data file import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated gramamr according to last PR. Removed the acceptance of 千

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated according to last PR. Removed comma after decimal points

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for Fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for money and updated according to last PR. Plus process of 元

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar. updates due to the updates in cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR comments. removing am and pm and allowing simple mandarin expression

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* arrangements

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added whitelist grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* word grammar for non-classified items

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated cardinal, decimal, time, itn data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to last PR

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to the updates for cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for more Mandarin punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR. removing am pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* adjustment on the weight

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the targger updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the time tagger

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to changes in tagger on am and pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* verbalizer for fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for mandarin grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* kept this file because using English utils results in data namin error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* merge conflict

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed import os

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed variables

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue, reccreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue recreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed codeing style/format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed coding style and format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicated graph for 毛

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unnecessary comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* unnecessary comment removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test file updated for more cases

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated with a comment explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated the file explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added Mandarin as zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing for dplication

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed unused NEMO objects

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix test file failures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix file failtures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failture

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adap to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adapt to grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adopt to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixing pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed // for zhtn/itn cache

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Update inverse_normalize.py

Added zh as a selection to pass Jenkins checks.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: Alex Cui <alcui@nvidia.com>
Co-authored-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated pynini_export.py file to create far files (#88)

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* readd Swedish (#87)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn 0712 (#89)

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and fixings according to document on natonal gideline

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fraction updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* money updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* punctuation grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* time gramamr updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* tokenizaer updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates on certificate

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* data updated and added due to updates and chanegs to the existing grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* cardinal updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* date grammar changed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test data added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test python file edits

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for tn1.0 and previous tn grammar from contribution

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test cases updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fixed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* dates updated for init files

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated the date for zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added back the itn tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added back measure and math from previou TN

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated for tests reruns

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated weights

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn char (#95)

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* code stle

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* audio-based TN fix for empty pred_text/text (#92)

* fix for empty pred_text

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add unittests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix pytest

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* pip 1.2.0

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* French tn (#91)

* add tests for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add fr tn for cardinals, decimals, fractions and ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* delete it far files from tools

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add languages to run_evaluate

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* remove ambiguous spacing

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* enable sh testing for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile cache date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix test for ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update tn cache for fr

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* resolve codeql issues

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist_tech.tsv (#96)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zhitn 0727 (#93)

* updates on itn grammar to pass sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats for sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates fro sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fix

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for coding style and sparrowhawk test

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated classes for tests on whitelist and word grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for tests on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_word.py

Removed unused import.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_word.py

Removed imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Removing imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Jenkinsfile

changed zh cache to 07-27-23 as it is the latest update.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es tn romans fix (#98)

* fix es tn roman exceptions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update eval script for ITN

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* codeql fix

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Change docker image (#102)

Change docker image to one including sparrowhawk

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Print warning instead exception (#97)

* raise text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* text arg

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* Failed text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* add logger

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* logger

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* NeMo-text-processing

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* info level

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Normalizer.select_verbalizer

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Exception

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restart ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nikolay Karpov <nkarpov@nvidia.com>
Co-authored-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* warning regardless of verbose flag (#107)

* warning

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* self.verbose

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Unpin setuptools (#106)

Signed-off-by: Peter Plantinga <plantinga.peter@proton.me>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed warnings: File is not always closes. (#113)

Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug #111 (ar currencies) (#117)

* fix bug #111 (ar currencies)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci folder

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Logging clean up + IT TN fix (#118)

* fix utils and it TN

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* clean up

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix logging

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add IT TN to CI

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update patch

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Time_IT_TN (#105)

* add time verbalizer

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add time tagger and verba

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add pytest time

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeQL

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix numbers with eight

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* IT TN improvement on tests (#120)

* add missing test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with time tests

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add sentence test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine shortest path for irregular cardinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add single letter exception for roman numerals (#121)

* add single letter exception for roman numerals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci dir

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrote tokenizer

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed the file and replaced it with char in 1.8

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkins file update

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to fix tn bug@ xuesong

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* tn bug

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixeds and updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adjustments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* testing commit

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating etst cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates adapting to graphs

Signed-off-by: Alex Cui <al…
BuyuanCui pushed a commit that referenced this pull request Sep 19, 2024
* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>
BuyuanCui added a commit that referenced this pull request Sep 19, 2024
* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add country codes from hu (#77)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix electronic case for username (#75)

* fix electronic username w/o .

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix ar test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci dirs, enable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.8 release (#79)

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Codeswitched ES/EN ITN  (#78)

* Initial commit for ES-EN codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable export for es_en codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add whitelist, update weights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests for en_es, zone tagged separately in es

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix path to test data for sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile - enable ES/EN tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add __init__.py files

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix issues with failed docker build - due to archiving of debian and issues with re2

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove unused imports and variables

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update date

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable NBSP in sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update copyrights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache path in for ES/EN CI/CD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* minor normalize.py edit for usability (#84)

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* documentation edits for grammar/clarity

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* added --output_field flag for command line interface

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish ITN (#40)

* force two digits for month

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put it in a function, because I reject the garbage pre-commit.ci came up with

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrap some more pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add graph pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete junk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser (copy from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add right tokens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks, more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to ITN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moved to tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nothing actually fixed here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* now most tests pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* electronic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bare fractions is a bit of an overreach

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just inverting the TN whitelist tagger will not work/be useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from English

Signed-off-by: Jim O'Regan <joregan@kth.se>

* overwrite with version from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swap tsv sides

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add optional_era variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make deterministic default, like most of the others

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also add lowercase versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replacing NEMO_SPACE does not work either

Signed-off-by: Jim O'Regan <joregan@kth.se>

* increasing weight... did not work last time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaking test cases, in case it was a sentence splitting issue. It was not

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put the full stops back

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add filler words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try splitting this out to see if it makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* aha, this part should be non-deterministic only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single line only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "increasing weight... did not work last time"

This reverts commit 39b020b50db745dfd6b281c8cbca45a033926996.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* disabling ITN here makes TN work again(?)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "disabling ITN here makes TN work again(?)"

This reverts commit be49d7d5c687876e51c2e9ce1cf1e01491df280f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing the variable name fixes norm tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing test tooling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a piece for area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add country codes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend any_read_digit for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country/area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* first attempt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove country codes for the time being, makes things ambiguous

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove trailing whitespace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update __init__.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transform of TN tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transformation of TN decimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* slight changes to date

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* include space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen was not that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* respect my own copyright xD

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading to constructor; had weirdness in this file, probably due to module-level python-suckage

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading, this has been an oddity before

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try changing this year declaration

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add year + era

Signed-off-by: Jim O'Regan <joregan@kth.se>

* eliminate more module-level data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "eliminate more module-level data loading"

This reverts commit 6a26e5d927817e1308e818758196924441ff7b3a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expose variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extra param for itn mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix parens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adapt/extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix dict init/change keys to something useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* initial stab at prefixed numbers

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert kl. if absent

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* the relative prefixed times

Signed-off-by: Jim O'Regan <joregan@kth.se>

* + comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* enable time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space in both directions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix hours to

Signed-off-by: Jim O'Regan <joregan@kth.se>

* split by before/after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix if

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. 9

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* keep only get_abs_path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add trimmed file

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* two abs_paths... could be fun

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minutes/seconds

Signed-off-by: Jim O'Regan <joregan@kth.se>

* suffix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* one optional

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. or one of suffix/zone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* already disambiguated

Signed-off-by: Jim O'Regan <joregan@kth.se>

* closure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do not insert kl.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix spelling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Delete measure.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete money.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused test pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add/update __init__

Signed-off-by: Jim O'Regan <joregan@kth.se>

* blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix lang

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove year, conflicts with cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space before, not after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cardinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* spurious deletion

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* re-enable SV TN; enable SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "re-enable SV TN; enable SV ITN"

This reverts commit 3ce4dfde1f70a89afc274284f6e4c737b3fac95b.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix singulras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use cdrewrite

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just EOS/BOS

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* omit en/ett, because they are also articles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* strip spaces from decimal part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partial fix, not what I wanted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* en/ett cannot work in itn case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* be more deliberate in graph construction

Signed-off-by: Jim O'Regan <joregan@kth.se>

* accept both

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* +2 tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* (try to) accept singular quantities for plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* retry

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* oops

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* arcmap

Signed-off-by: Jim O'Regan <joregan@kth.se>

* version without ones

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of this, this is where it goes wrong

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* multiple states from both ones, try removing and readding

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove ones, see if that fixes at least the bare quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, dunno why it still breaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move definition

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* local declaration, seems to not be working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* match verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix last two failing tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing tests for telephone and word

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of convert_space, tests fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "put convert_spaces back, change test file; pytest fails"

This reverts commit a7bb7489137b8026aab02aff64df39e874630043.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails, take 2

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deliberately remove spaces rather than have a non-determinism that comes out differently in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try converting the non-breaking spaces in the shell script

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrong place

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update date.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update time.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix comment

Signed-off-by: Jim O’Regan <joregan@kth.se>

* trim comments

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove commented line

Signed-off-by: Jim O’Regan <joregan@kth.se>

* en halv

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update test_sparrowhawk_inverse_text_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Italian_TN (#67)

* add TN italian

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix init

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix LOCATION

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* modify graph_utils

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* correct decimals

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix measure

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh itn (#74)

* Add ZH ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix copyrights and code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove invalid tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve CodeQL issues

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix missing 'zh' option for ITN and correct comment

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update __init__.py

Change to zh instead of en for the imports.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for decimal test data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* update for langauge import

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for Chinese punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* a new class for whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* PYNINI_AVAILABLE = False

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to file import format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* caught duplicates, removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates, arranges for CHInese Yuan updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates accordingly to the comments from last PR. Recreated some of the files due to format issues

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the hours_to and minute_to files used for back counting. ALso removed am and pm suffix files according to the last PR. Recreated some of them for format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* re-added this file to avoid data file import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated gramamr according to last PR. Removed the acceptance of 千

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated according to last PR. Removed comma after decimal points

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for Fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for money and updated according to last PR. Plus process of 元

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar. updates due to the updates in cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR comments. removing am and pm and allowing simple mandarin expression

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* arrangements

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added whitelist grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* word grammar for non-classified items

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated cardinal, decimal, time, itn data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to last PR

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to the updates for cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for more Mandarin punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR. removing am pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* adjustment on the weight

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the targger updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the time tagger

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to changes in tagger on am and pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* verbalizer for fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for mandarin grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* kept this file because using English utils results in data namin error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* merge conflict

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed import os

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed variables

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue, reccreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue recreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed codeing style/format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed coding style and format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicated graph for 毛

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unnecessary comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* unnecessary comment removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test file updated for more cases

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated with a comment explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated the file explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added Mandarin as zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing for dplication

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed unused NEMO objects

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix test file failures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix file failtures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failture

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adap to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adapt to grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adopt to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixing pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed // for zhtn/itn cache

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Update inverse_normalize.py

Added zh as a selection to pass Jenkins checks.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: Alex Cui <alcui@nvidia.com>
Co-authored-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated pynini_export.py file to create far files (#88)

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* readd Swedish (#87)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn 0712 (#89)

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and fixings according to document on natonal gideline

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fraction updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* money updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* punctuation grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* time gramamr updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* tokenizaer updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates on certificate

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* data updated and added due to updates and chanegs to the existing grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* cardinal updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* date grammar changed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test data added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test python file edits

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for tn1.0 and previous tn grammar from contribution

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test cases updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fixed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* dates updated for init files

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated the date for zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added back the itn tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added back measure and math from previou TN

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated for tests reruns

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated weights

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn char (#95)

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* code stle

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* audio-based TN fix for empty pred_text/text (#92)

* fix for empty pred_text

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add unittests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix pytest

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* pip 1.2.0

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* French tn (#91)

* add tests for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add fr tn for cardinals, decimals, fractions and ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* delete it far files from tools

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add languages to run_evaluate

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* remove ambiguous spacing

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* enable sh testing for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile cache date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix test for ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update tn cache for fr

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* resolve codeql issues

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist_tech.tsv (#96)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zhitn 0727 (#93)

* updates on itn grammar to pass sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats for sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates fro sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fix

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for coding style and sparrowhawk test

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated classes for tests on whitelist and word grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for tests on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_word.py

Removed unused import.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_word.py

Removed imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Removing imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Jenkinsfile

changed zh cache to 07-27-23 as it is the latest update.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es tn romans fix (#98)

* fix es tn roman exceptions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update eval script for ITN

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* codeql fix

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Change docker image (#102)

Change docker image to one including sparrowhawk

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Print warning instead exception (#97)

* raise text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* text arg

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* Failed text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* add logger

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* logger

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* NeMo-text-processing

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* info level

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Normalizer.select_verbalizer

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Exception

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restart ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nikolay Karpov <nkarpov@nvidia.com>
Co-authored-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* warning regardless of verbose flag (#107)

* warning

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* self.verbose

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Unpin setuptools (#106)

Signed-off-by: Peter Plantinga <plantinga.peter@proton.me>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed warnings: File is not always closes. (#113)

Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug #111 (ar currencies) (#117)

* fix bug #111 (ar currencies)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci folder

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Logging clean up + IT TN fix (#118)

* fix utils and it TN

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* clean up

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix logging

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add IT TN to CI

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update patch

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Time_IT_TN (#105)

* add time verbalizer

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add time tagger and verba

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add pytest time

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeQL

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix numbers with eight

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* IT TN improvement on tests (#120)

* add missing test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with time tests

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add sentence test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine shortest path for irregular cardinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add single letter exception for roman numerals (#121)

* add single letter exception for roman numerals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci dir

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrote tokenizer

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed the file and replaced it with char in 1.8

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkins file update

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to fix tn bug@ xuesong

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* tn bug

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixeds and updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adjustments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* testing commit

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating etst cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates adapting to graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated cases for SH tests

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added some sentences

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test cases update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* solving rebase issue, repushing changes

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixings according to ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixings according to the ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* notused removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* format issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* formt issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unused files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* remiving unsed files;

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added sentences as test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added senetnces as test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed commentyed out tests

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating dates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* attemps to fix bug

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* inprocess of fixing the bug

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing existing issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated graph_utils, tokenize and classify, and word graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added bacl the ppostprocessor far creation

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated NEMO_NOT_ALPHA as a new variable

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* far files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* combiedn into measure

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing and combined to meaasure

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing, not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to solve the space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving sh issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving sh test issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding anands updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* data updated for measure and whitelist

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing fraction and math part

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing preprocessor, updating measure, adding shitelist cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing processor, modification for sp test, shitelist and word

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating zh date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* realized itn being cvommented out, adding back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* trying to run zh tn separately because it takes long time to run

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* modification to ru zh tn separately

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* independent zh tnitn tests for more time

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding lines to save far file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates for reducing testing time

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* for ounct graph

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing used graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* format and removing used comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing this one, not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* remove unused commentss�

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Delete tools/text_processing_deployment/zh directory

Removing far files.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* updates according to the github comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* punct grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_cases_cardinal.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Dockerfile

Copied from main branch ( which included Anand's updates)

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update launch.sh

Found differences in the file. Fixing it back.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_word.py

Saw word ITN being commented out. Adding it back.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update money.py

Found cardinal grammar not accepting suffix. Fixed it.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update Jenkinsfile

Removed duplicated zh test from line 230s

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update utils.py

Addressing bug raised in bug in graph_utils.py of zh ITN and decimal tagger of ar TN #162.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update graph_utils.py

Addressing bug in graph_utils.py of zh ITN and decimal tagger of ar TN #162.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update measure.py

Fixing code style, removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Fixing code style, removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update measure.py

Removing unused import.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update post_processing.py

Removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update post_processing.py

Removing unused import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update cardinal.py

Deleting unused graph

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Removing import pynini

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

removing pynini import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update verbalize.py

removing pynutil import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update post_processing.py

removing punct graph imported

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_sparrowhawk_normalization.sh

Update on test issue for Docker file locations

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_ordinal.py

Fixing style.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Delete nemo_text_processing/text_normalization/zh/taggers/math_symbol.py

Removing because it's not one of the semiotic classes.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Delete nemo_text_processing/text_normalization/zh/verbalizers/math_symbol.py

Removing because it's not one of the semiotic classes.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Jenkinsfile

Updating Jenkins date

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>
Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Peter Plantinga <plantinga.peter@proton.me>
Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Co-authored-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: Enno Hermann <Eginhard@users.noreply.github.com>
Co-authored-by: Vitaly Lavrukhin <vitaly.lavrukhin@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Enas Albasiri <71229149+ealbasiri@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: lleaver <137942999+lleaver@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: Giacomo Leone Maria Cavallini <72698188+GiacomoLeoneMaria@users.noreply.github.com>
Co-authored-by: Alex Cui <alcui@nvidia.com>
Co-authored-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: Nikolay Karpov <nkarpov@nvidia.com>
Co-authored-by: Peter Plantinga <plantinga.peter@proton.me>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>
BuyuanCui added a commit that referenced this pull request Sep 19, 2024
* IT TN improvement on tests (#120)

* add missing test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with time tests

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add sentence test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine shortest path for irregular cardinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add single letter exception for roman numerals (#121)

* add single letter exception for roman numerals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci dir

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix broken path for nondet whitelist (#124)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Increase weights for serial (en TN) (#128)

* Increase weights for serial (en TN)

Resolves https://github.com/NVIDIA/NeMo-text-processing/issues/126

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Add tests for fix

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Update Jenkinsfile cache path

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Update Jenkinsfile. Fix cache folder

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measures file for FR TN (#131)

* add measures file

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* update whitelist data

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* add fr tn tests

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Sh jenkins (#127)

* Add SH tests to Jenkins

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkins tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add CI/CD tests for sparrowhawk

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* docker build only if in test mode

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix missing variable

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix comments and remove arguments not required

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix commands not executing

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Missing arguments

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Missing quotes

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix incorrect path for tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Incorrect paths of tests and shunit2

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix issues with paths as arguments to shunit

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Undo path change

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix intentional fail test

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* revert redundant check for cased option

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix default path in export_grammars.sh

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add interactive option

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add SH tests for cased EN ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update isort - fix precommit (#138)

* update isort version

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update isort version

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused imports

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Armenian itn (#136)

* Added Armenian ITN

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added Armenian ITN

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added Armenian ITN

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added context for tests and fixed CodeQL errors

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Revert "Added context for tests and fixed CodeQL errors"

This reverts commit 2c804d941963c0be21d3aad07e6cd13568ab747b.

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added context to some test files and fixed CodeQL errors

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* deleted unnecessary data

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* translated a few measurements to Armenian

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* adjusted some things for better readability and maintainer support

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed one test case and some issues

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>
Co-authored-by: David Sargsyan <d.sargsyan@ispras.ru>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix CI (#142)

* fix whitelist deployment

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* clean up

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comment out tests to recreate grammars

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* shorten test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* cased for TN

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* revert debug changes

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix args default

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* try parallel

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* debug parallel

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* rerun

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* rerun

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix sh tests for local SH launcher

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* enable all ci tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* enable all ci tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Armenian TN (#137)

* merged with main branch and fixed conflicts

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* fixing conflicts

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* fixing some more conflicts

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* fixed a minor issue

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* deleted unused imports

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix: add "hy" language option for armenian

Signed-off-by: Ara Yeroyan <60027241+Ara-Yeroyan@users.noreply.github.com>

* added optional space for measurements after cardinals/decimals

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* added Armenian dot

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>
Signed-off-by: Ara Yeroyan <60027241+Ara-Yeroyan@users.noreply.github.com>
Signed-off-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Co-authored-by: David Sargsyan <d.sargsyan@ispras.ru>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Ara Yeroyan <60027241+Ara-Yeroyan@users.noreply.github.com>
Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Marathi ITN (#134)

* Added Marathi ITN

Signed-off-by: Chinmay Patil <chinmaypatil2000@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adding jenkins test

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

---------

Signed-off-by: Chinmay Patil <chinmaypatil2000@gmail.com>
Signed-off-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Signed-off-by: Travis Bartley <tbartley@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Co-authored-by: Travis Bartley <tbartley@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkins fix (#150)

* jenkins fix

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* removing armenian to troubleshoot jenkins

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* removing armenian to troubleshoot jenkins

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* missing _init_ for python

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* mislabled cache

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

---------

Signed-off-by: Travis Bartley <tbartley@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* r0.3.0 release (#151)

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix text=line[text] to text=line[text_field] (#153)

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* use real string on docstring (#157)

Signed-off-by: Kevin Sanders <kevin.sanders@dialpad.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Sh postprocess (#147)

* Add support for postprocessor far in sparrowhawk

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Choose between having a post processor or not

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update run_evaluate script for cased itn (#164)

* update run_evaluate script for cased itn

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* remove unused function from ar tn decimals (#165)

* remove unused function from ar tn decimals

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* ZH sentence-level TN (#112)

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add country codes from hu (#77)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix electronic case for username (#75)

* fix electronic username w/o .

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix ar test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci dirs, enable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.8 release (#79)

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Codeswitched ES/EN ITN  (#78)

* Initial commit for ES-EN codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable export for es_en codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add whitelist, update weights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests for en_es, zone tagged separately in es

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix path to test data for sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile - enable ES/EN tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add __init__.py files

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix issues with failed docker build - due to archiving of debian and issues with re2

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove unused imports and variables

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update date

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable NBSP in sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update copyrights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache path in for ES/EN CI/CD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* minor normalize.py edit for usability (#84)

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* documentation edits for grammar/clarity

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* added --output_field flag for command line interface

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish ITN (#40)

* force two digits for month

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put it in a function, because I reject the garbage pre-commit.ci came up with

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrap some more pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add graph pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete junk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser (copy from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add right tokens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks, more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to ITN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moved to tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nothing actually fixed here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* now most tests pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* electronic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bare fractions is a bit of an overreach

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just inverting the TN whitelist tagger will not work/be useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from English

Signed-off-by: Jim O'Regan <joregan@kth.se>

* overwrite with version from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swap tsv sides

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add optional_era variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make deterministic default, like most of the others

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also add lowercase versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replacing NEMO_SPACE does not work either

Signed-off-by: Jim O'Regan <joregan@kth.se>

* increasing weight... did not work last time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaking test cases, in case it was a sentence splitting issue. It was not

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put the full stops back

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add filler words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try splitting this out to see if it makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* aha, this part should be non-deterministic only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single line only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "increasing weight... did not work last time"

This reverts commit 39b020b50db745dfd6b281c8cbca45a033926996.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* disabling ITN here makes TN work again(?)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "disabling ITN here makes TN work again(?)"

This reverts commit be49d7d5c687876e51c2e9ce1cf1e01491df280f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing the variable name fixes norm tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing test tooling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a piece for area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add country codes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend any_read_digit for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country/area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* first attempt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove country codes for the time being, makes things ambiguous

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove trailing whitespace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update __init__.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transform of TN tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transformation of TN decimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* slight changes to date

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* include space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen was not that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* respect my own copyright xD

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading to constructor; had weirdness in this file, probably due to module-level python-suckage

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading, this has been an oddity before

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try changing this year declaration

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add year + era

Signed-off-by: Jim O'Regan <joregan@kth.se>

* eliminate more module-level data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "eliminate more module-level data loading"

This reverts commit 6a26e5d927817e1308e818758196924441ff7b3a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expose variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extra param for itn mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix parens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adapt/extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix dict init/change keys to something useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* initial stab at prefixed numbers

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert kl. if absent

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* the relative prefixed times

Signed-off-by: Jim O'Regan <joregan@kth.se>

* + comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* enable time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space in both directions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix hours to

Signed-off-by: Jim O'Regan <joregan@kth.se>

* split by before/after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix if

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. 9

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* keep only get_abs_path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add trimmed file

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* two abs_paths... could be fun

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minutes/seconds

Signed-off-by: Jim O'Regan <joregan@kth.se>

* suffix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* one optional

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. or one of suffix/zone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* already disambiguated

Signed-off-by: Jim O'Regan <joregan@kth.se>

* closure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do not insert kl.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix spelling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Delete measure.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete money.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused test pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add/update __init__

Signed-off-by: Jim O'Regan <joregan@kth.se>

* blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix lang

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove year, conflicts with cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space before, not after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cardinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* spurious deletion

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* re-enable SV TN; enable SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "re-enable SV TN; enable SV ITN"

This reverts commit 3ce4dfde1f70a89afc274284f6e4c737b3fac95b.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix singulras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use cdrewrite

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just EOS/BOS

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* omit en/ett, because they are also articles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* strip spaces from decimal part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partial fix, not what I wanted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* en/ett cannot work in itn case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* be more deliberate in graph construction

Signed-off-by: Jim O'Regan <joregan@kth.se>

* accept both

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* +2 tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* (try to) accept singular quantities for plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* retry

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* oops

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* arcmap

Signed-off-by: Jim O'Regan <joregan@kth.se>

* version without ones

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of this, this is where it goes wrong

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* multiple states from both ones, try removing and readding

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove ones, see if that fixes at least the bare quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, dunno why it still breaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move definition

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* local declaration, seems to not be working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* match verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix last two failing tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing tests for telephone and word

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of convert_space, tests fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "put convert_spaces back, change test file; pytest fails"

This reverts commit a7bb7489137b8026aab02aff64df39e874630043.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails, take 2

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deliberately remove spaces rather than have a non-determinism that comes out differently in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try converting the non-breaking spaces in the shell script

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrong place

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update date.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update time.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix comment

Signed-off-by: Jim O’Regan <joregan@kth.se>

* trim comments

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove commented line

Signed-off-by: Jim O’Regan <joregan@kth.se>

* en halv

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update test_sparrowhawk_inverse_text_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Italian_TN (#67)

* add TN italian

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix init

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix LOCATION

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* modify graph_utils

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* correct decimals

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix measure

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh itn (#74)

* Add ZH ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix copyrights and code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove invalid tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve CodeQL issues

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix missing 'zh' option for ITN and correct comment

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update __init__.py

Change to zh instead of en for the imports.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for decimal test data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* update for langauge import

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for Chinese punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* a new class for whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* PYNINI_AVAILABLE = False

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to file import format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* caught duplicates, removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates, arranges for CHInese Yuan updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates accordingly to the comments from last PR. Recreated some of the files due to format issues

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the hours_to and minute_to files used for back counting. ALso removed am and pm suffix files according to the last PR. Recreated some of them for format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* re-added this file to avoid data file import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated gramamr according to last PR. Removed the acceptance of 千

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated according to last PR. Removed comma after decimal points

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for Fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for money and updated according to last PR. Plus process of 元

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar. updates due to the updates in cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR comments. removing am and pm and allowing simple mandarin expression

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* arrangements

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added whitelist grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* word grammar for non-classified items

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated cardinal, decimal, time, itn data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to last PR

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to the updates for cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for more Mandarin punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR. removing am pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* adjustment on the weight

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the targger updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the time tagger

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to changes in tagger on am and pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* verbalizer for fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for mandarin grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* kept this file because using English utils results in data namin error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* merge conflict

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed import os

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed variables

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue, reccreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue recreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed codeing style/format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed coding style and format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicated graph for 毛

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unnecessary comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* unnecessary comment removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test file updated for more cases

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated with a comment explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated the file explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added Mandarin as zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing for dplication

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed unused NEMO objects

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix test file failures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix file failtures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failture

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adap to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adapt to grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adopt to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixing pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed // for zhtn/itn cache

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Update inverse_normalize.py

Added zh as a selection to pass Jenkins checks.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: Alex Cui <alcui@nvidia.com>
Co-authored-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated pynini_export.py file to create far files (#88)

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* readd Swedish (#87)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn 0712 (#89)

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and fixings according to document on natonal gideline

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fraction updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* money updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* punctuation grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* time gramamr updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* tokenizaer updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates on certificate

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* data updated and added due to updates and chanegs to the existing grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* cardinal updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* date grammar changed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test data added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test python file edits

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for tn1.0 and previous tn grammar from contribution

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test cases updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fixed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* dates updated for init files

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated the date for zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added back the itn tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added back measure and math from previou TN

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated for tests reruns

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated weights

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn char (#95)

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* code stle

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* audio-based TN fix for empty pred_text/text (#92)

* fix for empty pred_text

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add unittests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix pytest

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* pip 1.2.0

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* French tn (#91)

* add tests for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add fr tn for cardinals, decimals, fractions and ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* delete it far files from tools

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add languages to run_evaluate

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* remove ambiguous spacing

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* enable sh testing for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile cache date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix test for ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update tn cache for fr

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* resolve codeql issues

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist_tech.tsv (#96)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zhitn 0727 (#93)

* updates on itn grammar to pass sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats for sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates fro sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fix

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for coding style and sparrowhawk test

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated classes for tests on whitelist and word grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for tests on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_word.py

Removed unused import.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_word.py

Removed imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Removing imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Jenkinsfile

changed zh cache to 07-27-23 as it is the latest update.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es tn romans fix (#98)

* fix es tn roman exceptions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update eval script for ITN

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* codeql fix

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Change docker image (#102)

Change docker image to one including sparrowhawk

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Print warning instead exception (#97)

* raise text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* text arg

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* Failed text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* add logger

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* logger

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* NeMo-text-processing

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* info level

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Normalizer.select_verbalizer

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Exception

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restart ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nikolay Karpov <nkarpov@nvidia.com>
Co-authored-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* warning regardless of verbose flag (#107)

* warning

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* self.verbose

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Unpin setuptools (#106)

Signed-off-by: Peter Plantinga <plantinga.peter@proton.me>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed warnings: File is not always closes. (#113)

Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug #111 (ar currencies) (#117)

* fix bug #111 (ar currencies)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci folder

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Logging clean up + IT TN fix (#118)

* fix utils and it TN

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* clean up

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix logging

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add IT TN to CI

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update patch

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Time_IT_TN (#105)

* add time verbalizer

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add time tagger and verba

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add pytest time

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeQL

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix numbers with eight

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* IT TN improvement on tests (#120)

* add missing test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with time tests

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add sentence test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine shortest path for irregular cardinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add single letter exception for roman numerals (#121)

* add single letter exception for roman numerals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci dir

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrote tokenizer

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed the file and replaced it with char in 1.8

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkins file update

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to fix tn bug@ xuesong

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* tn bug

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixeds and updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adjustments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* testing commit

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating etst cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates adapting to graphs

Signed-off-by: Alex Cui <al…
BuyuanCui pushed a commit that referenced this pull request Sep 26, 2024
* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>
BuyuanCui added a commit that referenced this pull request Sep 26, 2024
* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add country codes from hu (#77)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix electronic case for username (#75)

* fix electronic username w/o .

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix ar test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci dirs, enable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.8 release (#79)

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Codeswitched ES/EN ITN  (#78)

* Initial commit for ES-EN codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable export for es_en codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add whitelist, update weights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests for en_es, zone tagged separately in es

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix path to test data for sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile - enable ES/EN tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add __init__.py files

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix issues with failed docker build - due to archiving of debian and issues with re2

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove unused imports and variables

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update date

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable NBSP in sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update copyrights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache path in for ES/EN CI/CD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* minor normalize.py edit for usability (#84)

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* documentation edits for grammar/clarity

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* added --output_field flag for command line interface

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish ITN (#40)

* force two digits for month

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put it in a function, because I reject the garbage pre-commit.ci came up with

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrap some more pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add graph pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete junk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser (copy from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add right tokens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks, more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to ITN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moved to tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nothing actually fixed here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* now most tests pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* electronic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bare fractions is a bit of an overreach

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just inverting the TN whitelist tagger will not work/be useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from English

Signed-off-by: Jim O'Regan <joregan@kth.se>

* overwrite with version from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swap tsv sides

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add optional_era variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make deterministic default, like most of the others

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also add lowercase versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replacing NEMO_SPACE does not work either

Signed-off-by: Jim O'Regan <joregan@kth.se>

* increasing weight... did not work last time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaking test cases, in case it was a sentence splitting issue. It was not

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put the full stops back

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add filler words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try splitting this out to see if it makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* aha, this part should be non-deterministic only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single line only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "increasing weight... did not work last time"

This reverts commit 39b020b50db745dfd6b281c8cbca45a033926996.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* disabling ITN here makes TN work again(?)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "disabling ITN here makes TN work again(?)"

This reverts commit be49d7d5c687876e51c2e9ce1cf1e01491df280f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing the variable name fixes norm tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing test tooling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a piece for area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add country codes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend any_read_digit for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country/area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* first attempt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove country codes for the time being, makes things ambiguous

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove trailing whitespace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update __init__.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transform of TN tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transformation of TN decimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* slight changes to date

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* include space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen was not that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* respect my own copyright xD

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading to constructor; had weirdness in this file, probably due to module-level python-suckage

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading, this has been an oddity before

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try changing this year declaration

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add year + era

Signed-off-by: Jim O'Regan <joregan@kth.se>

* eliminate more module-level data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "eliminate more module-level data loading"

This reverts commit 6a26e5d927817e1308e818758196924441ff7b3a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expose variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extra param for itn mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix parens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adapt/extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix dict init/change keys to something useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* initial stab at prefixed numbers

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert kl. if absent

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* the relative prefixed times

Signed-off-by: Jim O'Regan <joregan@kth.se>

* + comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* enable time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space in both directions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix hours to

Signed-off-by: Jim O'Regan <joregan@kth.se>

* split by before/after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix if

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. 9

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* keep only get_abs_path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add trimmed file

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* two abs_paths... could be fun

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minutes/seconds

Signed-off-by: Jim O'Regan <joregan@kth.se>

* suffix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* one optional

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. or one of suffix/zone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* already disambiguated

Signed-off-by: Jim O'Regan <joregan@kth.se>

* closure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do not insert kl.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix spelling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Delete measure.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete money.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused test pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add/update __init__

Signed-off-by: Jim O'Regan <joregan@kth.se>

* blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix lang

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove year, conflicts with cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space before, not after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cardinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* spurious deletion

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* re-enable SV TN; enable SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "re-enable SV TN; enable SV ITN"

This reverts commit 3ce4dfde1f70a89afc274284f6e4c737b3fac95b.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix singulras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use cdrewrite

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just EOS/BOS

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* omit en/ett, because they are also articles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* strip spaces from decimal part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partial fix, not what I wanted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* en/ett cannot work in itn case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* be more deliberate in graph construction

Signed-off-by: Jim O'Regan <joregan@kth.se>

* accept both

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* +2 tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* (try to) accept singular quantities for plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* retry

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* oops

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* arcmap

Signed-off-by: Jim O'Regan <joregan@kth.se>

* version without ones

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of this, this is where it goes wrong

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* multiple states from both ones, try removing and readding

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove ones, see if that fixes at least the bare quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, dunno why it still breaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move definition

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* local declaration, seems to not be working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* match verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix last two failing tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing tests for telephone and word

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of convert_space, tests fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "put convert_spaces back, change test file; pytest fails"

This reverts commit a7bb7489137b8026aab02aff64df39e874630043.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails, take 2

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deliberately remove spaces rather than have a non-determinism that comes out differently in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try converting the non-breaking spaces in the shell script

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrong place

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update date.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update time.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix comment

Signed-off-by: Jim O’Regan <joregan@kth.se>

* trim comments

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove commented line

Signed-off-by: Jim O’Regan <joregan@kth.se>

* en halv

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update test_sparrowhawk_inverse_text_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Italian_TN (#67)

* add TN italian

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix init

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix LOCATION

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* modify graph_utils

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* correct decimals

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix measure

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh itn (#74)

* Add ZH ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix copyrights and code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove invalid tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve CodeQL issues

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix missing 'zh' option for ITN and correct comment

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update __init__.py

Change to zh instead of en for the imports.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for decimal test data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* update for langauge import

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for Chinese punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* a new class for whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* PYNINI_AVAILABLE = False

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to file import format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* caught duplicates, removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates, arranges for CHInese Yuan updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates accordingly to the comments from last PR. Recreated some of the files due to format issues

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the hours_to and minute_to files used for back counting. ALso removed am and pm suffix files according to the last PR. Recreated some of them for format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* re-added this file to avoid data file import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated gramamr according to last PR. Removed the acceptance of 千

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated according to last PR. Removed comma after decimal points

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for Fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for money and updated according to last PR. Plus process of 元

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar. updates due to the updates in cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR comments. removing am and pm and allowing simple mandarin expression

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* arrangements

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added whitelist grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* word grammar for non-classified items

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated cardinal, decimal, time, itn data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to last PR

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to the updates for cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for more Mandarin punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR. removing am pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* adjustment on the weight

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the targger updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the time tagger

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to changes in tagger on am and pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* verbalizer for fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for mandarin grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* kept this file because using English utils results in data namin error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* merge conflict

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed import os

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed variables

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue, reccreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue recreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed codeing style/format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed coding style and format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicated graph for 毛

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unnecessary comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* unnecessary comment removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test file updated for more cases

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated with a comment explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated the file explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added Mandarin as zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing for dplication

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed unused NEMO objects

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix test file failures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix file failtures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failture

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adap to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adapt to grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adopt to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixing pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed // for zhtn/itn cache

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Update inverse_normalize.py

Added zh as a selection to pass Jenkins checks.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: Alex Cui <alcui@nvidia.com>
Co-authored-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated pynini_export.py file to create far files (#88)

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* readd Swedish (#87)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn 0712 (#89)

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and fixings according to document on natonal gideline

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fraction updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* money updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* punctuation grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* time gramamr updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* tokenizaer updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates on certificate

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* data updated and added due to updates and chanegs to the existing grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* cardinal updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* date grammar changed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test data added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test python file edits

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for tn1.0 and previous tn grammar from contribution

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test cases updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fixed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* dates updated for init files

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated the date for zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added back the itn tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added back measure and math from previou TN

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated for tests reruns

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated weights

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn char (#95)

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* code stle

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* audio-based TN fix for empty pred_text/text (#92)

* fix for empty pred_text

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add unittests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix pytest

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* pip 1.2.0

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* French tn (#91)

* add tests for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add fr tn for cardinals, decimals, fractions and ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* delete it far files from tools

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add languages to run_evaluate

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* remove ambiguous spacing

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* enable sh testing for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile cache date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix test for ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update tn cache for fr

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* resolve codeql issues

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist_tech.tsv (#96)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zhitn 0727 (#93)

* updates on itn grammar to pass sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats for sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates fro sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fix

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for coding style and sparrowhawk test

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated classes for tests on whitelist and word grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for tests on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_word.py

Removed unused import.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_word.py

Removed imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Removing imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Jenkinsfile

changed zh cache to 07-27-23 as it is the latest update.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es tn romans fix (#98)

* fix es tn roman exceptions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update eval script for ITN

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* codeql fix

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Change docker image (#102)

Change docker image to one including sparrowhawk

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Print warning instead exception (#97)

* raise text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* text arg

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* Failed text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* add logger

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* logger

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* NeMo-text-processing

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* info level

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Normalizer.select_verbalizer

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Exception

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restart ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nikolay Karpov <nkarpov@nvidia.com>
Co-authored-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* warning regardless of verbose flag (#107)

* warning

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* self.verbose

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Unpin setuptools (#106)

Signed-off-by: Peter Plantinga <plantinga.peter@proton.me>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed warnings: File is not always closes. (#113)

Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug #111 (ar currencies) (#117)

* fix bug #111 (ar currencies)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci folder

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Logging clean up + IT TN fix (#118)

* fix utils and it TN

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* clean up

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix logging

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add IT TN to CI

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update patch

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Time_IT_TN (#105)

* add time verbalizer

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add time tagger and verba

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add pytest time

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeQL

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix numbers with eight

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* IT TN improvement on tests (#120)

* add missing test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with time tests

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add sentence test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine shortest path for irregular cardinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add single letter exception for roman numerals (#121)

* add single letter exception for roman numerals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci dir

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrote tokenizer

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed the file and replaced it with char in 1.8

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkins file update

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to fix tn bug@ xuesong

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* tn bug

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixeds and updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adjustments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* testing commit

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating etst cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates adapting to graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated cases for SH tests

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added some sentences

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test cases update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* solving rebase issue, repushing changes

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixings according to ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixings according to the ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* notused removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* format issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* formt issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unused files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* remiving unsed files;

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added sentences as test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added senetnces as test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed commentyed out tests

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating dates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* attemps to fix bug

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* inprocess of fixing the bug

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing existing issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated graph_utils, tokenize and classify, and word graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added bacl the ppostprocessor far creation

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated NEMO_NOT_ALPHA as a new variable

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* far files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* combiedn into measure

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing and combined to meaasure

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing, not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to solve the space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving sh issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving sh test issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding anands updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* data updated for measure and whitelist

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing fraction and math part

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing preprocessor, updating measure, adding shitelist cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing processor, modification for sp test, shitelist and word

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating zh date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* realized itn being cvommented out, adding back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* trying to run zh tn separately because it takes long time to run

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* modification to ru zh tn separately

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* independent zh tnitn tests for more time

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding lines to save far file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates for reducing testing time

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* for ounct graph

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing used graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* format and removing used comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing this one, not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* remove unused commentss�

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Delete tools/text_processing_deployment/zh directory

Removing far files.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* updates according to the github comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* punct grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_cases_cardinal.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Dockerfile

Copied from main branch ( which included Anand's updates)

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update launch.sh

Found differences in the file. Fixing it back.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_word.py

Saw word ITN being commented out. Adding it back.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update money.py

Found cardinal grammar not accepting suffix. Fixed it.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update Jenkinsfile

Removed duplicated zh test from line 230s

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update utils.py

Addressing bug raised in bug in graph_utils.py of zh ITN and decimal tagger of ar TN #162.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update graph_utils.py

Addressing bug in graph_utils.py of zh ITN and decimal tagger of ar TN #162.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update measure.py

Fixing code style, removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Fixing code style, removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update measure.py

Removing unused import.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update post_processing.py

Removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update post_processing.py

Removing unused import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update cardinal.py

Deleting unused graph

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Removing import pynini

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

removing pynini import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update verbalize.py

removing pynutil import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update post_processing.py

removing punct graph imported

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_sparrowhawk_normalization.sh

Update on test issue for Docker file locations

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_ordinal.py

Fixing style.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Delete nemo_text_processing/text_normalization/zh/taggers/math_symbol.py

Removing because it's not one of the semiotic classes.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Delete nemo_text_processing/text_normalization/zh/verbalizers/math_symbol.py

Removing because it's not one of the semiotic classes.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Jenkinsfile

Updating Jenkins date

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>
Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Peter Plantinga <plantinga.peter@proton.me>
Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Co-authored-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: Enno Hermann <Eginhard@users.noreply.github.com>
Co-authored-by: Vitaly Lavrukhin <vitaly.lavrukhin@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Enas Albasiri <71229149+ealbasiri@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: lleaver <137942999+lleaver@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: Giacomo Leone Maria Cavallini <72698188+GiacomoLeoneMaria@users.noreply.github.com>
Co-authored-by: Alex Cui <alcui@nvidia.com>
Co-authored-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: Nikolay Karpov <nkarpov@nvidia.com>
Co-authored-by: Peter Plantinga <plantinga.peter@proton.me>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>
BuyuanCui added a commit that referenced this pull request Sep 26, 2024
* IT TN improvement on tests (#120)

* add missing test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with time tests

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add sentence test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine shortest path for irregular cardinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add single letter exception for roman numerals (#121)

* add single letter exception for roman numerals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci dir

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix broken path for nondet whitelist (#124)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Increase weights for serial (en TN) (#128)

* Increase weights for serial (en TN)

Resolves https://github.com/NVIDIA/NeMo-text-processing/issues/126

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Add tests for fix

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Update Jenkinsfile cache path

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Update Jenkinsfile. Fix cache folder

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measures file for FR TN (#131)

* add measures file

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* update whitelist data

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* add fr tn tests

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Sh jenkins (#127)

* Add SH tests to Jenkins

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkins tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add CI/CD tests for sparrowhawk

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* docker build only if in test mode

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix missing variable

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix comments and remove arguments not required

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix commands not executing

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Missing arguments

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Missing quotes

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix incorrect path for tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Incorrect paths of tests and shunit2

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix issues with paths as arguments to shunit

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Undo path change

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix intentional fail test

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* revert redundant check for cased option

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix default path in export_grammars.sh

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add interactive option

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add SH tests for cased EN ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update isort - fix precommit (#138)

* update isort version

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update isort version

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused imports

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Armenian itn (#136)

* Added Armenian ITN

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added Armenian ITN

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added Armenian ITN

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added context for tests and fixed CodeQL errors

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Revert "Added context for tests and fixed CodeQL errors"

This reverts commit 2c804d941963c0be21d3aad07e6cd13568ab747b.

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added context to some test files and fixed CodeQL errors

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* deleted unnecessary data

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* translated a few measurements to Armenian

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* adjusted some things for better readability and maintainer support

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed one test case and some issues

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>
Co-authored-by: David Sargsyan <d.sargsyan@ispras.ru>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix CI (#142)

* fix whitelist deployment

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* clean up

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comment out tests to recreate grammars

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* shorten test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* cased for TN

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* revert debug changes

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix args default

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* try parallel

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* debug parallel

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* rerun

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* rerun

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix sh tests for local SH launcher

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* enable all ci tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* enable all ci tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Armenian TN (#137)

* merged with main branch and fixed conflicts

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* fixing conflicts

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* fixing some more conflicts

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* fixed a minor issue

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* deleted unused imports

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix: add "hy" language option for armenian

Signed-off-by: Ara Yeroyan <60027241+Ara-Yeroyan@users.noreply.github.com>

* added optional space for measurements after cardinals/decimals

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* added Armenian dot

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>
Signed-off-by: Ara Yeroyan <60027241+Ara-Yeroyan@users.noreply.github.com>
Signed-off-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Co-authored-by: David Sargsyan <d.sargsyan@ispras.ru>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Ara Yeroyan <60027241+Ara-Yeroyan@users.noreply.github.com>
Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Marathi ITN (#134)

* Added Marathi ITN

Signed-off-by: Chinmay Patil <chinmaypatil2000@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adding jenkins test

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

---------

Signed-off-by: Chinmay Patil <chinmaypatil2000@gmail.com>
Signed-off-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Signed-off-by: Travis Bartley <tbartley@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Co-authored-by: Travis Bartley <tbartley@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkins fix (#150)

* jenkins fix

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* removing armenian to troubleshoot jenkins

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* removing armenian to troubleshoot jenkins

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* missing _init_ for python

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* mislabled cache

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

---------

Signed-off-by: Travis Bartley <tbartley@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* r0.3.0 release (#151)

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix text=line[text] to text=line[text_field] (#153)

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* use real string on docstring (#157)

Signed-off-by: Kevin Sanders <kevin.sanders@dialpad.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Sh postprocess (#147)

* Add support for postprocessor far in sparrowhawk

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Choose between having a post processor or not

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update run_evaluate script for cased itn (#164)

* update run_evaluate script for cased itn

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* remove unused function from ar tn decimals (#165)

* remove unused function from ar tn decimals

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* ZH sentence-level TN (#112)

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add country codes from hu (#77)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix electronic case for username (#75)

* fix electronic username w/o .

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix ar test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci dirs, enable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.8 release (#79)

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Codeswitched ES/EN ITN  (#78)

* Initial commit for ES-EN codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable export for es_en codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add whitelist, update weights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests for en_es, zone tagged separately in es

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix path to test data for sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile - enable ES/EN tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add __init__.py files

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix issues with failed docker build - due to archiving of debian and issues with re2

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove unused imports and variables

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update date

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable NBSP in sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update copyrights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache path in for ES/EN CI/CD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* minor normalize.py edit for usability (#84)

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* documentation edits for grammar/clarity

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* added --output_field flag for command line interface

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish ITN (#40)

* force two digits for month

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put it in a function, because I reject the garbage pre-commit.ci came up with

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrap some more pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add graph pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete junk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser (copy from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add right tokens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks, more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to ITN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moved to tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nothing actually fixed here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* now most tests pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* electronic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bare fractions is a bit of an overreach

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just inverting the TN whitelist tagger will not work/be useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from English

Signed-off-by: Jim O'Regan <joregan@kth.se>

* overwrite with version from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swap tsv sides

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add optional_era variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make deterministic default, like most of the others

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also add lowercase versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replacing NEMO_SPACE does not work either

Signed-off-by: Jim O'Regan <joregan@kth.se>

* increasing weight... did not work last time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaking test cases, in case it was a sentence splitting issue. It was not

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put the full stops back

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add filler words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try splitting this out to see if it makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* aha, this part should be non-deterministic only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single line only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "increasing weight... did not work last time"

This reverts commit 39b020b50db745dfd6b281c8cbca45a033926996.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* disabling ITN here makes TN work again(?)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "disabling ITN here makes TN work again(?)"

This reverts commit be49d7d5c687876e51c2e9ce1cf1e01491df280f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing the variable name fixes norm tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing test tooling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a piece for area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add country codes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend any_read_digit for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country/area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* first attempt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove country codes for the time being, makes things ambiguous

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove trailing whitespace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update __init__.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transform of TN tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transformation of TN decimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* slight changes to date

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* include space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen was not that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* respect my own copyright xD

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading to constructor; had weirdness in this file, probably due to module-level python-suckage

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading, this has been an oddity before

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try changing this year declaration

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add year + era

Signed-off-by: Jim O'Regan <joregan@kth.se>

* eliminate more module-level data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "eliminate more module-level data loading"

This reverts commit 6a26e5d927817e1308e818758196924441ff7b3a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expose variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extra param for itn mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix parens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adapt/extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix dict init/change keys to something useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* initial stab at prefixed numbers

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert kl. if absent

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* the relative prefixed times

Signed-off-by: Jim O'Regan <joregan@kth.se>

* + comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* enable time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space in both directions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix hours to

Signed-off-by: Jim O'Regan <joregan@kth.se>

* split by before/after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix if

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. 9

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* keep only get_abs_path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add trimmed file

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* two abs_paths... could be fun

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minutes/seconds

Signed-off-by: Jim O'Regan <joregan@kth.se>

* suffix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* one optional

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. or one of suffix/zone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* already disambiguated

Signed-off-by: Jim O'Regan <joregan@kth.se>

* closure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do not insert kl.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix spelling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Delete measure.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete money.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused test pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add/update __init__

Signed-off-by: Jim O'Regan <joregan@kth.se>

* blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix lang

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove year, conflicts with cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space before, not after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cardinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* spurious deletion

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* re-enable SV TN; enable SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "re-enable SV TN; enable SV ITN"

This reverts commit 3ce4dfde1f70a89afc274284f6e4c737b3fac95b.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix singulras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use cdrewrite

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just EOS/BOS

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* omit en/ett, because they are also articles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* strip spaces from decimal part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partial fix, not what I wanted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* en/ett cannot work in itn case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* be more deliberate in graph construction

Signed-off-by: Jim O'Regan <joregan@kth.se>

* accept both

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* +2 tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* (try to) accept singular quantities for plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* retry

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* oops

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* arcmap

Signed-off-by: Jim O'Regan <joregan@kth.se>

* version without ones

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of this, this is where it goes wrong

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* multiple states from both ones, try removing and readding

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove ones, see if that fixes at least the bare quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, dunno why it still breaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move definition

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* local declaration, seems to not be working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* match verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix last two failing tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing tests for telephone and word

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of convert_space, tests fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "put convert_spaces back, change test file; pytest fails"

This reverts commit a7bb7489137b8026aab02aff64df39e874630043.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails, take 2

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deliberately remove spaces rather than have a non-determinism that comes out differently in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try converting the non-breaking spaces in the shell script

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrong place

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update date.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update time.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix comment

Signed-off-by: Jim O’Regan <joregan@kth.se>

* trim comments

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove commented line

Signed-off-by: Jim O’Regan <joregan@kth.se>

* en halv

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update test_sparrowhawk_inverse_text_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Italian_TN (#67)

* add TN italian

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix init

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix LOCATION

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* modify graph_utils

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* correct decimals

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix measure

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh itn (#74)

* Add ZH ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix copyrights and code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove invalid tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve CodeQL issues

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix missing 'zh' option for ITN and correct comment

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update __init__.py

Change to zh instead of en for the imports.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for decimal test data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* update for langauge import

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for Chinese punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* a new class for whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* PYNINI_AVAILABLE = False

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to file import format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* caught duplicates, removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates, arranges for CHInese Yuan updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates accordingly to the comments from last PR. Recreated some of the files due to format issues

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the hours_to and minute_to files used for back counting. ALso removed am and pm suffix files according to the last PR. Recreated some of them for format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* re-added this file to avoid data file import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated gramamr according to last PR. Removed the acceptance of 千

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated according to last PR. Removed comma after decimal points

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for Fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for money and updated according to last PR. Plus process of 元

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar. updates due to the updates in cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR comments. removing am and pm and allowing simple mandarin expression

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* arrangements

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added whitelist grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* word grammar for non-classified items

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated cardinal, decimal, time, itn data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to last PR

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to the updates for cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for more Mandarin punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR. removing am pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* adjustment on the weight

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the targger updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the time tagger

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to changes in tagger on am and pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* verbalizer for fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for mandarin grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* kept this file because using English utils results in data namin error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* merge conflict

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed import os

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed variables

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue, reccreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue recreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed codeing style/format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed coding style and format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicated graph for 毛

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unnecessary comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* unnecessary comment removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test file updated for more cases

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated with a comment explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated the file explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added Mandarin as zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing for dplication

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed unused NEMO objects

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix test file failures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix file failtures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failture

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adap to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adapt to grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adopt to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixing pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed // for zhtn/itn cache

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Update inverse_normalize.py

Added zh as a selection to pass Jenkins checks.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: Alex Cui <alcui@nvidia.com>
Co-authored-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated pynini_export.py file to create far files (#88)

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* readd Swedish (#87)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn 0712 (#89)

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and fixings according to document on natonal gideline

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fraction updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* money updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* punctuation grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* time gramamr updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* tokenizaer updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates on certificate

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* data updated and added due to updates and chanegs to the existing grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* cardinal updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* date grammar changed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test data added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test python file edits

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for tn1.0 and previous tn grammar from contribution

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test cases updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fixed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* dates updated for init files

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated the date for zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added back the itn tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added back measure and math from previou TN

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated for tests reruns

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated weights

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn char (#95)

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* code stle

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* audio-based TN fix for empty pred_text/text (#92)

* fix for empty pred_text

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add unittests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix pytest

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* pip 1.2.0

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* French tn (#91)

* add tests for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add fr tn for cardinals, decimals, fractions and ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* delete it far files from tools

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add languages to run_evaluate

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* remove ambiguous spacing

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* enable sh testing for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile cache date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix test for ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update tn cache for fr

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* resolve codeql issues

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist_tech.tsv (#96)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zhitn 0727 (#93)

* updates on itn grammar to pass sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats for sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates fro sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fix

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for coding style and sparrowhawk test

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated classes for tests on whitelist and word grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for tests on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_word.py

Removed unused import.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_word.py

Removed imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Removing imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Jenkinsfile

changed zh cache to 07-27-23 as it is the latest update.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es tn romans fix (#98)

* fix es tn roman exceptions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update eval script for ITN

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* codeql fix

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Change docker image (#102)

Change docker image to one including sparrowhawk

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Print warning instead exception (#97)

* raise text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* text arg

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* Failed text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* add logger

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* logger

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* NeMo-text-processing

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* info level

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Normalizer.select_verbalizer

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Exception

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restart ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nikolay Karpov <nkarpov@nvidia.com>
Co-authored-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* warning regardless of verbose flag (#107)

* warning

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* self.verbose

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Unpin setuptools (#106)

Signed-off-by: Peter Plantinga <plantinga.peter@proton.me>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed warnings: File is not always closes. (#113)

Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug #111 (ar currencies) (#117)

* fix bug #111 (ar currencies)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci folder

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Logging clean up + IT TN fix (#118)

* fix utils and it TN

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* clean up

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix logging

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add IT TN to CI

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update patch

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Time_IT_TN (#105)

* add time verbalizer

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add time tagger and verba

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add pytest time

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeQL

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix numbers with eight

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* IT TN improvement on tests (#120)

* add missing test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with time tests

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add sentence test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine shortest path for irregular cardinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add single letter exception for roman numerals (#121)

* add single letter exception for roman numerals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci dir

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrote tokenizer

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed the file and replaced it with char in 1.8

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkins file update

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to fix tn bug@ xuesong

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* tn bug

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixeds and updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adjustments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* testing commit

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating etst cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates adapting to graphs

Signed-off-by: Alex Cui <al…
BuyuanCui added a commit that referenced this pull request Sep 26, 2024
* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add country codes from hu (#77)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix electronic case for username (#75)

* fix electronic username w/o .

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix ar test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci dirs, enable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.8 release (#79)

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Codeswitched ES/EN ITN  (#78)

* Initial commit for ES-EN codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable export for es_en codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add whitelist, update weights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests for en_es, zone tagged separately in es

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix path to test data for sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile - enable ES/EN tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add __init__.py files

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix issues with failed docker build - due to archiving of debian and issues with re2

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove unused imports and variables

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update date

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable NBSP in sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update copyrights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache path in for ES/EN CI/CD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* minor normalize.py edit for usability (#84)

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* documentation edits for grammar/clarity

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* added --output_field flag for command line interface

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish ITN (#40)

* force two digits for month

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put it in a function, because I reject the garbage pre-commit.ci came up with

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrap some more pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add graph pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete junk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser (copy from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add right tokens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks, more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to ITN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moved to tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nothing actually fixed here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* now most tests pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* electronic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bare fractions is a bit of an overreach

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just inverting the TN whitelist tagger will not work/be useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from English

Signed-off-by: Jim O'Regan <joregan@kth.se>

* overwrite with version from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swap tsv sides

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add optional_era variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make deterministic default, like most of the others

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also add lowercase versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replacing NEMO_SPACE does not work either

Signed-off-by: Jim O'Regan <joregan@kth.se>

* increasing weight... did not work last time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaking test cases, in case it was a sentence splitting issue. It was not

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put the full stops back

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add filler words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try splitting this out to see if it makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* aha, this part should be non-deterministic only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single line only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "increasing weight... did not work last time"

This reverts commit 39b020b50db745dfd6b281c8cbca45a033926996.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* disabling ITN here makes TN work again(?)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "disabling ITN here makes TN work again(?)"

This reverts commit be49d7d5c687876e51c2e9ce1cf1e01491df280f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing the variable name fixes norm tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing test tooling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a piece for area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add country codes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend any_read_digit for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country/area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* first attempt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove country codes for the time being, makes things ambiguous

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove trailing whitespace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update __init__.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transform of TN tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transformation of TN decimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* slight changes to date

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* include space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen was not that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* respect my own copyright xD

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading to constructor; had weirdness in this file, probably due to module-level python-suckage

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading, this has been an oddity before

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try changing this year declaration

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add year + era

Signed-off-by: Jim O'Regan <joregan@kth.se>

* eliminate more module-level data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "eliminate more module-level data loading"

This reverts commit 6a26e5d927817e1308e818758196924441ff7b3a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expose variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extra param for itn mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix parens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adapt/extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix dict init/change keys to something useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* initial stab at prefixed numbers

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert kl. if absent

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* the relative prefixed times

Signed-off-by: Jim O'Regan <joregan@kth.se>

* + comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* enable time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space in both directions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix hours to

Signed-off-by: Jim O'Regan <joregan@kth.se>

* split by before/after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix if

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. 9

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* keep only get_abs_path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add trimmed file

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* two abs_paths... could be fun

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minutes/seconds

Signed-off-by: Jim O'Regan <joregan@kth.se>

* suffix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* one optional

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. or one of suffix/zone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* already disambiguated

Signed-off-by: Jim O'Regan <joregan@kth.se>

* closure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do not insert kl.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix spelling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Delete measure.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete money.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused test pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add/update __init__

Signed-off-by: Jim O'Regan <joregan@kth.se>

* blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix lang

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove year, conflicts with cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space before, not after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cardinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* spurious deletion

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* re-enable SV TN; enable SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "re-enable SV TN; enable SV ITN"

This reverts commit 3ce4dfde1f70a89afc274284f6e4c737b3fac95b.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix singulras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use cdrewrite

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just EOS/BOS

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* omit en/ett, because they are also articles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* strip spaces from decimal part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partial fix, not what I wanted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* en/ett cannot work in itn case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* be more deliberate in graph construction

Signed-off-by: Jim O'Regan <joregan@kth.se>

* accept both

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* +2 tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* (try to) accept singular quantities for plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* retry

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* oops

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* arcmap

Signed-off-by: Jim O'Regan <joregan@kth.se>

* version without ones

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of this, this is where it goes wrong

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* multiple states from both ones, try removing and readding

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove ones, see if that fixes at least the bare quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, dunno why it still breaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move definition

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* local declaration, seems to not be working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* match verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix last two failing tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing tests for telephone and word

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of convert_space, tests fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "put convert_spaces back, change test file; pytest fails"

This reverts commit a7bb7489137b8026aab02aff64df39e874630043.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails, take 2

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deliberately remove spaces rather than have a non-determinism that comes out differently in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try converting the non-breaking spaces in the shell script

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrong place

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update date.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update time.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix comment

Signed-off-by: Jim O’Regan <joregan@kth.se>

* trim comments

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove commented line

Signed-off-by: Jim O’Regan <joregan@kth.se>

* en halv

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update test_sparrowhawk_inverse_text_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Italian_TN (#67)

* add TN italian

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix init

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix LOCATION

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* modify graph_utils

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* correct decimals

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix measure

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh itn (#74)

* Add ZH ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix copyrights and code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove invalid tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve CodeQL issues

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix missing 'zh' option for ITN and correct comment

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update __init__.py

Change to zh instead of en for the imports.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for decimal test data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* update for langauge import

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for Chinese punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* a new class for whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* PYNINI_AVAILABLE = False

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to file import format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* caught duplicates, removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates, arranges for CHInese Yuan updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates accordingly to the comments from last PR. Recreated some of the files due to format issues

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the hours_to and minute_to files used for back counting. ALso removed am and pm suffix files according to the last PR. Recreated some of them for format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* re-added this file to avoid data file import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated gramamr according to last PR. Removed the acceptance of 千

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated according to last PR. Removed comma after decimal points

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for Fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for money and updated according to last PR. Plus process of 元

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar. updates due to the updates in cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR comments. removing am and pm and allowing simple mandarin expression

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* arrangements

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added whitelist grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* word grammar for non-classified items

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated cardinal, decimal, time, itn data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to last PR

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to the updates for cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for more Mandarin punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR. removing am pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* adjustment on the weight

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the targger updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the time tagger

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to changes in tagger on am and pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* verbalizer for fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for mandarin grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* kept this file because using English utils results in data namin error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* merge conflict

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed import os

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed variables

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue, reccreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue recreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed codeing style/format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed coding style and format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicated graph for 毛

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unnecessary comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* unnecessary comment removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test file updated for more cases

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated with a comment explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated the file explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added Mandarin as zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing for dplication

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed unused NEMO objects

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix test file failures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix file failtures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failture

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adap to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adapt to grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adopt to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixing pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed // for zhtn/itn cache

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Update inverse_normalize.py

Added zh as a selection to pass Jenkins checks.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: Alex Cui <alcui@nvidia.com>
Co-authored-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated pynini_export.py file to create far files (#88)

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* readd Swedish (#87)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn 0712 (#89)

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and fixings according to document on natonal gideline

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fraction updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* money updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* punctuation grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* time gramamr updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* tokenizaer updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates on certificate

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* data updated and added due to updates and chanegs to the existing grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* cardinal updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* date grammar changed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test data added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test python file edits

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for tn1.0 and previous tn grammar from contribution

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test cases updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fixed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* dates updated for init files

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated the date for zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added back the itn tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added back measure and math from previou TN

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated for tests reruns

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated weights

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn char (#95)

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* code stle

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* audio-based TN fix for empty pred_text/text (#92)

* fix for empty pred_text

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add unittests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix pytest

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* pip 1.2.0

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* French tn (#91)

* add tests for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add fr tn for cardinals, decimals, fractions and ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* delete it far files from tools

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add languages to run_evaluate

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* remove ambiguous spacing

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* enable sh testing for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile cache date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix test for ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update tn cache for fr

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* resolve codeql issues

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist_tech.tsv (#96)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zhitn 0727 (#93)

* updates on itn grammar to pass sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats for sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates fro sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fix

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for coding style and sparrowhawk test

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated classes for tests on whitelist and word grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for tests on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_word.py

Removed unused import.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_word.py

Removed imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Removing imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Jenkinsfile

changed zh cache to 07-27-23 as it is the latest update.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es tn romans fix (#98)

* fix es tn roman exceptions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update eval script for ITN

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* codeql fix

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Change docker image (#102)

Change docker image to one including sparrowhawk

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Print warning instead exception (#97)

* raise text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* text arg

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* Failed text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* add logger

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* logger

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* NeMo-text-processing

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* info level

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Normalizer.select_verbalizer

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Exception

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restart ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nikolay Karpov <nkarpov@nvidia.com>
Co-authored-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* warning regardless of verbose flag (#107)

* warning

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* self.verbose

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Unpin setuptools (#106)

Signed-off-by: Peter Plantinga <plantinga.peter@proton.me>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed warnings: File is not always closes. (#113)

Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug #111 (ar currencies) (#117)

* fix bug #111 (ar currencies)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci folder

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Logging clean up + IT TN fix (#118)

* fix utils and it TN

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* clean up

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix logging

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add IT TN to CI

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update patch

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Time_IT_TN (#105)

* add time verbalizer

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add time tagger and verba

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add pytest time

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeQL

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix numbers with eight

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* IT TN improvement on tests (#120)

* add missing test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with time tests

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add sentence test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine shortest path for irregular cardinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add single letter exception for roman numerals (#121)

* add single letter exception for roman numerals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci dir

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrote tokenizer

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed the file and replaced it with char in 1.8

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkins file update

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to fix tn bug@ xuesong

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* tn bug

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixeds and updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adjustments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* testing commit

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating etst cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates adapting to graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated cases for SH tests

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added some sentences

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test cases update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* solving rebase issue, repushing changes

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixings according to ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixings according to the ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* notused removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* format issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* formt issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unused files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* remiving unsed files;

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added sentences as test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added senetnces as test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed commentyed out tests

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating dates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* attemps to fix bug

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* inprocess of fixing the bug

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing existing issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated graph_utils, tokenize and classify, and word graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added bacl the ppostprocessor far creation

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated NEMO_NOT_ALPHA as a new variable

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* far files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* combiedn into measure

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing and combined to meaasure

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing, not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to solve the space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving sh issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving sh test issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding anands updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* data updated for measure and whitelist

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing fraction and math part

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing preprocessor, updating measure, adding shitelist cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing processor, modification for sp test, shitelist and word

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating zh date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* realized itn being cvommented out, adding back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* trying to run zh tn separately because it takes long time to run

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* modification to ru zh tn separately

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* independent zh tnitn tests for more time

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding lines to save far file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates for reducing testing time

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* for ounct graph

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing used graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* format and removing used comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing this one, not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* remove unused commentss�

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Delete tools/text_processing_deployment/zh directory

Removing far files.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* updates according to the github comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* punct grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_cases_cardinal.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Dockerfile

Copied from main branch ( which included Anand's updates)

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update launch.sh

Found differences in the file. Fixing it back.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_word.py

Saw word ITN being commented out. Adding it back.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update money.py

Found cardinal grammar not accepting suffix. Fixed it.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update Jenkinsfile

Removed duplicated zh test from line 230s

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update utils.py

Addressing bug raised in bug in graph_utils.py of zh ITN and decimal tagger of ar TN #162.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update graph_utils.py

Addressing bug in graph_utils.py of zh ITN and decimal tagger of ar TN #162.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update measure.py

Fixing code style, removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Fixing code style, removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update measure.py

Removing unused import.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update post_processing.py

Removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update post_processing.py

Removing unused import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update cardinal.py

Deleting unused graph

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Removing import pynini

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

removing pynini import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update verbalize.py

removing pynutil import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update post_processing.py

removing punct graph imported

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_sparrowhawk_normalization.sh

Update on test issue for Docker file locations

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_ordinal.py

Fixing style.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Delete nemo_text_processing/text_normalization/zh/taggers/math_symbol.py

Removing because it's not one of the semiotic classes.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Delete nemo_text_processing/text_normalization/zh/verbalizers/math_symbol.py

Removing because it's not one of the semiotic classes.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Jenkinsfile

Updating Jenkins date

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>
Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Peter Plantinga <plantinga.peter@proton.me>
Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Co-authored-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: Enno Hermann <Eginhard@users.noreply.github.com>
Co-authored-by: Vitaly Lavrukhin <vitaly.lavrukhin@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Enas Albasiri <71229149+ealbasiri@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: lleaver <137942999+lleaver@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: Giacomo Leone Maria Cavallini <72698188+GiacomoLeoneMaria@users.noreply.github.com>
Co-authored-by: Alex Cui <alcui@nvidia.com>
Co-authored-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: Nikolay Karpov <nkarpov@nvidia.com>
Co-authored-by: Peter Plantinga <plantinga.peter@proton.me>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>
BuyuanCui added a commit that referenced this pull request Sep 26, 2024
* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add country codes from hu (#77)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix electronic case for username (#75)

* fix electronic username w/o .

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix ar test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci dirs, enable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.8 release (#79)

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Codeswitched ES/EN ITN  (#78)

* Initial commit for ES-EN codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable export for es_en codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add whitelist, update weights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests for en_es, zone tagged separately in es

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix path to test data for sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile - enable ES/EN tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add __init__.py files

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix issues with failed docker build - due to archiving of debian and issues with re2

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove unused imports and variables

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update date

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable NBSP in sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update copyrights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache path in for ES/EN CI/CD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* minor normalize.py edit for usability (#84)

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* documentation edits for grammar/clarity

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* added --output_field flag for command line interface

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish ITN (#40)

* force two digits for month

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put it in a function, because I reject the garbage pre-commit.ci came up with

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrap some more pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add graph pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete junk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser (copy from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add right tokens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks, more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to ITN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moved to tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nothing actually fixed here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* now most tests pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* electronic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bare fractions is a bit of an overreach

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just inverting the TN whitelist tagger will not work/be useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from English

Signed-off-by: Jim O'Regan <joregan@kth.se>

* overwrite with version from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swap tsv sides

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add optional_era variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make deterministic default, like most of the others

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also add lowercase versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replacing NEMO_SPACE does not work either

Signed-off-by: Jim O'Regan <joregan@kth.se>

* increasing weight... did not work last time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaking test cases, in case it was a sentence splitting issue. It was not

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put the full stops back

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add filler words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try splitting this out to see if it makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* aha, this part should be non-deterministic only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single line only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "increasing weight... did not work last time"

This reverts commit 39b020b50db745dfd6b281c8cbca45a033926996.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* disabling ITN here makes TN work again(?)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "disabling ITN here makes TN work again(?)"

This reverts commit be49d7d5c687876e51c2e9ce1cf1e01491df280f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing the variable name fixes norm tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing test tooling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a piece for area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add country codes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend any_read_digit for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country/area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* first attempt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove country codes for the time being, makes things ambiguous

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove trailing whitespace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update __init__.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transform of TN tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transformation of TN decimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* slight changes to date

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* include space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen was not that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* respect my own copyright xD

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading to constructor; had weirdness in this file, probably due to module-level python-suckage

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading, this has been an oddity before

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try changing this year declaration

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add year + era

Signed-off-by: Jim O'Regan <joregan@kth.se>

* eliminate more module-level data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "eliminate more module-level data loading"

This reverts commit 6a26e5d927817e1308e818758196924441ff7b3a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expose variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extra param for itn mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix parens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adapt/extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix dict init/change keys to something useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* initial stab at prefixed numbers

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert kl. if absent

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* the relative prefixed times

Signed-off-by: Jim O'Regan <joregan@kth.se>

* + comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* enable time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space in both directions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix hours to

Signed-off-by: Jim O'Regan <joregan@kth.se>

* split by before/after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix if

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. 9

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* keep only get_abs_path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add trimmed file

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* two abs_paths... could be fun

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minutes/seconds

Signed-off-by: Jim O'Regan <joregan@kth.se>

* suffix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* one optional

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. or one of suffix/zone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* already disambiguated

Signed-off-by: Jim O'Regan <joregan@kth.se>

* closure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do not insert kl.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix spelling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Delete measure.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete money.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused test pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add/update __init__

Signed-off-by: Jim O'Regan <joregan@kth.se>

* blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix lang

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove year, conflicts with cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space before, not after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cardinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* spurious deletion

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* re-enable SV TN; enable SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "re-enable SV TN; enable SV ITN"

This reverts commit 3ce4dfde1f70a89afc274284f6e4c737b3fac95b.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix singulras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use cdrewrite

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just EOS/BOS

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* omit en/ett, because they are also articles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* strip spaces from decimal part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partial fix, not what I wanted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* en/ett cannot work in itn case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* be more deliberate in graph construction

Signed-off-by: Jim O'Regan <joregan@kth.se>

* accept both

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* +2 tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* (try to) accept singular quantities for plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* retry

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* oops

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* arcmap

Signed-off-by: Jim O'Regan <joregan@kth.se>

* version without ones

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of this, this is where it goes wrong

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* multiple states from both ones, try removing and readding

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove ones, see if that fixes at least the bare quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, dunno why it still breaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move definition

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* local declaration, seems to not be working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* match verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix last two failing tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing tests for telephone and word

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of convert_space, tests fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "put convert_spaces back, change test file; pytest fails"

This reverts commit a7bb7489137b8026aab02aff64df39e874630043.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails, take 2

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deliberately remove spaces rather than have a non-determinism that comes out differently in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try converting the non-breaking spaces in the shell script

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrong place

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update date.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update time.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix comment

Signed-off-by: Jim O’Regan <joregan@kth.se>

* trim comments

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove commented line

Signed-off-by: Jim O’Regan <joregan@kth.se>

* en halv

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update test_sparrowhawk_inverse_text_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Italian_TN (#67)

* add TN italian

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix init

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix LOCATION

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* modify graph_utils

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* correct decimals

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix measure

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh itn (#74)

* Add ZH ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix copyrights and code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove invalid tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve CodeQL issues

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix missing 'zh' option for ITN and correct comment

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update __init__.py

Change to zh instead of en for the imports.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for decimal test data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* update for langauge import

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for Chinese punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* a new class for whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* PYNINI_AVAILABLE = False

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to file import format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* caught duplicates, removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates, arranges for CHInese Yuan updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates accordingly to the comments from last PR. Recreated some of the files due to format issues

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the hours_to and minute_to files used for back counting. ALso removed am and pm suffix files according to the last PR. Recreated some of them for format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* re-added this file to avoid data file import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated gramamr according to last PR. Removed the acceptance of 千

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated according to last PR. Removed comma after decimal points

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for Fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for money and updated according to last PR. Plus process of 元

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar. updates due to the updates in cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR comments. removing am and pm and allowing simple mandarin expression

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* arrangements

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added whitelist grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* word grammar for non-classified items

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated cardinal, decimal, time, itn data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to last PR

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to the updates for cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for more Mandarin punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR. removing am pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* adjustment on the weight

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the targger updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the time tagger

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to changes in tagger on am and pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* verbalizer for fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for mandarin grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* kept this file because using English utils results in data namin error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* merge conflict

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed import os

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed variables

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue, reccreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue recreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed codeing style/format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed coding style and format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicated graph for 毛

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unnecessary comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* unnecessary comment removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test file updated for more cases

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated with a comment explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated the file explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added Mandarin as zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing for dplication

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed unused NEMO objects

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix test file failures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix file failtures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failture

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adap to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adapt to grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adopt to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixing pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed // for zhtn/itn cache

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Update inverse_normalize.py

Added zh as a selection to pass Jenkins checks.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: Alex Cui <alcui@nvidia.com>
Co-authored-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated pynini_export.py file to create far files (#88)

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* readd Swedish (#87)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn 0712 (#89)

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and fixings according to document on natonal gideline

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fraction updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* money updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* punctuation grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* time gramamr updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* tokenizaer updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates on certificate

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* data updated and added due to updates and chanegs to the existing grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* cardinal updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* date grammar changed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test data added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test python file edits

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for tn1.0 and previous tn grammar from contribution

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test cases updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fixed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* dates updated for init files

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated the date for zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added back the itn tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added back measure and math from previou TN

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated for tests reruns

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated weights

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn char (#95)

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* code stle

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* audio-based TN fix for empty pred_text/text (#92)

* fix for empty pred_text

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add unittests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix pytest

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* pip 1.2.0

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* French tn (#91)

* add tests for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add fr tn for cardinals, decimals, fractions and ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* delete it far files from tools

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add languages to run_evaluate

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* remove ambiguous spacing

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* enable sh testing for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile cache date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix test for ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update tn cache for fr

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* resolve codeql issues

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist_tech.tsv (#96)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zhitn 0727 (#93)

* updates on itn grammar to pass sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats for sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates fro sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fix

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for coding style and sparrowhawk test

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated classes for tests on whitelist and word grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for tests on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_word.py

Removed unused import.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_word.py

Removed imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Removing imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Jenkinsfile

changed zh cache to 07-27-23 as it is the latest update.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es tn romans fix (#98)

* fix es tn roman exceptions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update eval script for ITN

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* codeql fix

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Change docker image (#102)

Change docker image to one including sparrowhawk

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Print warning instead exception (#97)

* raise text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* text arg

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* Failed text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* add logger

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* logger

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* NeMo-text-processing

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* info level

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Normalizer.select_verbalizer

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Exception

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restart ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nikolay Karpov <nkarpov@nvidia.com>
Co-authored-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* warning regardless of verbose flag (#107)

* warning

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* self.verbose

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Unpin setuptools (#106)

Signed-off-by: Peter Plantinga <plantinga.peter@proton.me>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed warnings: File is not always closes. (#113)

Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug #111 (ar currencies) (#117)

* fix bug #111 (ar currencies)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci folder

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Logging clean up + IT TN fix (#118)

* fix utils and it TN

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* clean up

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix logging

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add IT TN to CI

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update patch

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Time_IT_TN (#105)

* add time verbalizer

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add time tagger and verba

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add pytest time

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeQL

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix numbers with eight

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* IT TN improvement on tests (#120)

* add missing test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with time tests

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add sentence test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine shortest path for irregular cardinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add single letter exception for roman numerals (#121)

* add single letter exception for roman numerals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci dir

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrote tokenizer

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed the file and replaced it with char in 1.8

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkins file update

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to fix tn bug@ xuesong

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* tn bug

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixeds and updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adjustments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* testing commit

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating etst cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates adapting to graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated cases for SH tests

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added some sentences

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test cases update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* solving rebase issue, repushing changes

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixings according to ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixings according to the ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* notused removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* format issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* formt issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unused files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* remiving unsed files;

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added sentences as test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added senetnces as test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed commentyed out tests

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating dates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* attemps to fix bug

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* inprocess of fixing the bug

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing existing issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated graph_utils, tokenize and classify, and word graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added bacl the ppostprocessor far creation

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated NEMO_NOT_ALPHA as a new variable

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* far files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* combiedn into measure

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing and combined to meaasure

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing, not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to solve the space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving sh issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving sh test issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding anands updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* data updated for measure and whitelist

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing fraction and math part

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing preprocessor, updating measure, adding shitelist cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing processor, modification for sp test, shitelist and word

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating zh date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* realized itn being cvommented out, adding back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* trying to run zh tn separately because it takes long time to run

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* modification to ru zh tn separately

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* independent zh tnitn tests for more time

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding lines to save far file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates for reducing testing time

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* for ounct graph

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing used graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* format and removing used comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing this one, not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* remove unused commentss�

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Delete tools/text_processing_deployment/zh directory

Removing far files.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* updates according to the github comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* punct grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_cases_cardinal.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Dockerfile

Copied from main branch ( which included Anand's updates)

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update launch.sh

Found differences in the file. Fixing it back.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_word.py

Saw word ITN being commented out. Adding it back.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update money.py

Found cardinal grammar not accepting suffix. Fixed it.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update Jenkinsfile

Removed duplicated zh test from line 230s

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update utils.py

Addressing bug raised in bug in graph_utils.py of zh ITN and decimal tagger of ar TN #162.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update graph_utils.py

Addressing bug in graph_utils.py of zh ITN and decimal tagger of ar TN #162.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update measure.py

Fixing code style, removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Fixing code style, removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update measure.py

Removing unused import.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update post_processing.py

Removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update post_processing.py

Removing unused import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update cardinal.py

Deleting unused graph

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Removing import pynini

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

removing pynini import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update verbalize.py

removing pynutil import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update post_processing.py

removing punct graph imported

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_sparrowhawk_normalization.sh

Update on test issue for Docker file locations

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_ordinal.py

Fixing style.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Delete nemo_text_processing/text_normalization/zh/taggers/math_symbol.py

Removing because it's not one of the semiotic classes.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Delete nemo_text_processing/text_normalization/zh/verbalizers/math_symbol.py

Removing because it's not one of the semiotic classes.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Jenkinsfile

Updating Jenkins date

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>
Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Peter Plantinga <plantinga.peter@proton.me>
Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Co-authored-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: Enno Hermann <Eginhard@users.noreply.github.com>
Co-authored-by: Vitaly Lavrukhin <vitaly.lavrukhin@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Enas Albasiri <71229149+ealbasiri@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: lleaver <137942999+lleaver@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: Giacomo Leone Maria Cavallini <72698188+GiacomoLeoneMaria@users.noreply.github.com>
Co-authored-by: Alex Cui <alcui@nvidia.com>
Co-authored-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: Nikolay Karpov <nkarpov@nvidia.com>
Co-authored-by: Peter Plantinga <plantinga.peter@proton.me>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>
BuyuanCui added a commit that referenced this pull request Sep 26, 2024
* IT TN improvement on tests (#120)

* add missing test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with time tests

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add sentence test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine shortest path for irregular cardinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add single letter exception for roman numerals (#121)

* add single letter exception for roman numerals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci dir

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix broken path for nondet whitelist (#124)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Increase weights for serial (en TN) (#128)

* Increase weights for serial (en TN)

Resolves https://github.com/NVIDIA/NeMo-text-processing/issues/126

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Add tests for fix

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Update Jenkinsfile cache path

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Update Jenkinsfile. Fix cache folder

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measures file for FR TN (#131)

* add measures file

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* update whitelist data

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* add fr tn tests

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Sh jenkins (#127)

* Add SH tests to Jenkins

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkins tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add CI/CD tests for sparrowhawk

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* docker build only if in test mode

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix missing variable

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix comments and remove arguments not required

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix commands not executing

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Missing arguments

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Missing quotes

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix incorrect path for tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Incorrect paths of tests and shunit2

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix issues with paths as arguments to shunit

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Undo path change

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix intentional fail test

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* revert redundant check for cased option

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix default path in export_grammars.sh

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add interactive option

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add SH tests for cased EN ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update isort - fix precommit (#138)

* update isort version

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update isort version

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused imports

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Armenian itn (#136)

* Added Armenian ITN

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added Armenian ITN

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added Armenian ITN

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added context for tests and fixed CodeQL errors

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Revert "Added context for tests and fixed CodeQL errors"

This reverts commit 2c804d941963c0be21d3aad07e6cd13568ab747b.

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added context to some test files and fixed CodeQL errors

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* deleted unnecessary data

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* translated a few measurements to Armenian

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* adjusted some things for better readability and maintainer support

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed one test case and some issues

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>
Co-authored-by: David Sargsyan <d.sargsyan@ispras.ru>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix CI (#142)

* fix whitelist deployment

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* clean up

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comment out tests to recreate grammars

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* shorten test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* cased for TN

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* revert debug changes

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix args default

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* try parallel

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* debug parallel

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* rerun

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* rerun

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix sh tests for local SH launcher

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* enable all ci tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* enable all ci tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Armenian TN (#137)

* merged with main branch and fixed conflicts

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* fixing conflicts

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* fixing some more conflicts

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* fixed a minor issue

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* deleted unused imports

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix: add "hy" language option for armenian

Signed-off-by: Ara Yeroyan <60027241+Ara-Yeroyan@users.noreply.github.com>

* added optional space for measurements after cardinals/decimals

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* added Armenian dot

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>
Signed-off-by: Ara Yeroyan <60027241+Ara-Yeroyan@users.noreply.github.com>
Signed-off-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Co-authored-by: David Sargsyan <d.sargsyan@ispras.ru>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Ara Yeroyan <60027241+Ara-Yeroyan@users.noreply.github.com>
Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Marathi ITN (#134)

* Added Marathi ITN

Signed-off-by: Chinmay Patil <chinmaypatil2000@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adding jenkins test

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

---------

Signed-off-by: Chinmay Patil <chinmaypatil2000@gmail.com>
Signed-off-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Signed-off-by: Travis Bartley <tbartley@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Co-authored-by: Travis Bartley <tbartley@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkins fix (#150)

* jenkins fix

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* removing armenian to troubleshoot jenkins

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* removing armenian to troubleshoot jenkins

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* missing _init_ for python

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* mislabled cache

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

---------

Signed-off-by: Travis Bartley <tbartley@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* r0.3.0 release (#151)

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix text=line[text] to text=line[text_field] (#153)

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* use real string on docstring (#157)

Signed-off-by: Kevin Sanders <kevin.sanders@dialpad.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Sh postprocess (#147)

* Add support for postprocessor far in sparrowhawk

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Choose between having a post processor or not

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update run_evaluate script for cased itn (#164)

* update run_evaluate script for cased itn

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* remove unused function from ar tn decimals (#165)

* remove unused function from ar tn decimals

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* ZH sentence-level TN (#112)

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add country codes from hu (#77)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix electronic case for username (#75)

* fix electronic username w/o .

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix ar test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci dirs, enable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.8 release (#79)

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Codeswitched ES/EN ITN  (#78)

* Initial commit for ES-EN codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable export for es_en codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add whitelist, update weights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests for en_es, zone tagged separately in es

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix path to test data for sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile - enable ES/EN tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add __init__.py files

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix issues with failed docker build - due to archiving of debian and issues with re2

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove unused imports and variables

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update date

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable NBSP in sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update copyrights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache path in for ES/EN CI/CD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* minor normalize.py edit for usability (#84)

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* documentation edits for grammar/clarity

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* added --output_field flag for command line interface

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish ITN (#40)

* force two digits for month

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put it in a function, because I reject the garbage pre-commit.ci came up with

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrap some more pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add graph pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete junk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser (copy from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add right tokens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks, more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to ITN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moved to tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nothing actually fixed here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* now most tests pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* electronic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bare fractions is a bit of an overreach

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just inverting the TN whitelist tagger will not work/be useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from English

Signed-off-by: Jim O'Regan <joregan@kth.se>

* overwrite with version from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swap tsv sides

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add optional_era variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make deterministic default, like most of the others

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also add lowercase versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replacing NEMO_SPACE does not work either

Signed-off-by: Jim O'Regan <joregan@kth.se>

* increasing weight... did not work last time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaking test cases, in case it was a sentence splitting issue. It was not

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put the full stops back

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add filler words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try splitting this out to see if it makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* aha, this part should be non-deterministic only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single line only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "increasing weight... did not work last time"

This reverts commit 39b020b50db745dfd6b281c8cbca45a033926996.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* disabling ITN here makes TN work again(?)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "disabling ITN here makes TN work again(?)"

This reverts commit be49d7d5c687876e51c2e9ce1cf1e01491df280f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing the variable name fixes norm tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing test tooling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a piece for area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add country codes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend any_read_digit for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country/area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* first attempt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove country codes for the time being, makes things ambiguous

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove trailing whitespace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update __init__.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transform of TN tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transformation of TN decimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* slight changes to date

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* include space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen was not that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* respect my own copyright xD

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading to constructor; had weirdness in this file, probably due to module-level python-suckage

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading, this has been an oddity before

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try changing this year declaration

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add year + era

Signed-off-by: Jim O'Regan <joregan@kth.se>

* eliminate more module-level data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "eliminate more module-level data loading"

This reverts commit 6a26e5d927817e1308e818758196924441ff7b3a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expose variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extra param for itn mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix parens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adapt/extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix dict init/change keys to something useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* initial stab at prefixed numbers

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert kl. if absent

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* the relative prefixed times

Signed-off-by: Jim O'Regan <joregan@kth.se>

* + comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* enable time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space in both directions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix hours to

Signed-off-by: Jim O'Regan <joregan@kth.se>

* split by before/after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix if

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. 9

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* keep only get_abs_path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add trimmed file

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* two abs_paths... could be fun

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minutes/seconds

Signed-off-by: Jim O'Regan <joregan@kth.se>

* suffix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* one optional

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. or one of suffix/zone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* already disambiguated

Signed-off-by: Jim O'Regan <joregan@kth.se>

* closure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do not insert kl.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix spelling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Delete measure.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete money.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused test pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add/update __init__

Signed-off-by: Jim O'Regan <joregan@kth.se>

* blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix lang

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove year, conflicts with cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space before, not after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cardinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* spurious deletion

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* re-enable SV TN; enable SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "re-enable SV TN; enable SV ITN"

This reverts commit 3ce4dfde1f70a89afc274284f6e4c737b3fac95b.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix singulras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use cdrewrite

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just EOS/BOS

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* omit en/ett, because they are also articles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* strip spaces from decimal part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partial fix, not what I wanted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* en/ett cannot work in itn case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* be more deliberate in graph construction

Signed-off-by: Jim O'Regan <joregan@kth.se>

* accept both

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* +2 tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* (try to) accept singular quantities for plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* retry

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* oops

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* arcmap

Signed-off-by: Jim O'Regan <joregan@kth.se>

* version without ones

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of this, this is where it goes wrong

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* multiple states from both ones, try removing and readding

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove ones, see if that fixes at least the bare quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, dunno why it still breaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move definition

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* local declaration, seems to not be working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* match verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix last two failing tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing tests for telephone and word

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of convert_space, tests fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "put convert_spaces back, change test file; pytest fails"

This reverts commit a7bb7489137b8026aab02aff64df39e874630043.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails, take 2

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deliberately remove spaces rather than have a non-determinism that comes out differently in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try converting the non-breaking spaces in the shell script

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrong place

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update date.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update time.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix comment

Signed-off-by: Jim O’Regan <joregan@kth.se>

* trim comments

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove commented line

Signed-off-by: Jim O’Regan <joregan@kth.se>

* en halv

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update test_sparrowhawk_inverse_text_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Italian_TN (#67)

* add TN italian

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix init

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix LOCATION

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* modify graph_utils

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* correct decimals

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix measure

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh itn (#74)

* Add ZH ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix copyrights and code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove invalid tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve CodeQL issues

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix missing 'zh' option for ITN and correct comment

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update __init__.py

Change to zh instead of en for the imports.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for decimal test data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* update for langauge import

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for Chinese punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* a new class for whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* PYNINI_AVAILABLE = False

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to file import format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* caught duplicates, removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates, arranges for CHInese Yuan updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates accordingly to the comments from last PR. Recreated some of the files due to format issues

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the hours_to and minute_to files used for back counting. ALso removed am and pm suffix files according to the last PR. Recreated some of them for format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* re-added this file to avoid data file import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated gramamr according to last PR. Removed the acceptance of 千

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated according to last PR. Removed comma after decimal points

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for Fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for money and updated according to last PR. Plus process of 元

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar. updates due to the updates in cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR comments. removing am and pm and allowing simple mandarin expression

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* arrangements

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added whitelist grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* word grammar for non-classified items

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated cardinal, decimal, time, itn data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to last PR

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to the updates for cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for more Mandarin punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR. removing am pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* adjustment on the weight

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the targger updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the time tagger

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to changes in tagger on am and pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* verbalizer for fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for mandarin grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* kept this file because using English utils results in data namin error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* merge conflict

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed import os

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed variables

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue, reccreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue recreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed codeing style/format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed coding style and format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicated graph for 毛

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unnecessary comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* unnecessary comment removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test file updated for more cases

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated with a comment explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated the file explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added Mandarin as zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing for dplication

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed unused NEMO objects

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix test file failures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix file failtures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failture

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adap to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adapt to grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adopt to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixing pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed // for zhtn/itn cache

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Update inverse_normalize.py

Added zh as a selection to pass Jenkins checks.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: Alex Cui <alcui@nvidia.com>
Co-authored-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated pynini_export.py file to create far files (#88)

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* readd Swedish (#87)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn 0712 (#89)

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and fixings according to document on natonal gideline

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fraction updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* money updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* punctuation grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* time gramamr updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* tokenizaer updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates on certificate

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* data updated and added due to updates and chanegs to the existing grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* cardinal updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* date grammar changed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test data added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test python file edits

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for tn1.0 and previous tn grammar from contribution

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test cases updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fixed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* dates updated for init files

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated the date for zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added back the itn tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added back measure and math from previou TN

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated for tests reruns

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated weights

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn char (#95)

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* code stle

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* audio-based TN fix for empty pred_text/text (#92)

* fix for empty pred_text

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add unittests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix pytest

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* pip 1.2.0

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* French tn (#91)

* add tests for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add fr tn for cardinals, decimals, fractions and ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* delete it far files from tools

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add languages to run_evaluate

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* remove ambiguous spacing

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* enable sh testing for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile cache date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix test for ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update tn cache for fr

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* resolve codeql issues

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist_tech.tsv (#96)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zhitn 0727 (#93)

* updates on itn grammar to pass sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats for sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates fro sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fix

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for coding style and sparrowhawk test

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated classes for tests on whitelist and word grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for tests on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_word.py

Removed unused import.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_word.py

Removed imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Removing imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Jenkinsfile

changed zh cache to 07-27-23 as it is the latest update.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es tn romans fix (#98)

* fix es tn roman exceptions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update eval script for ITN

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* codeql fix

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Change docker image (#102)

Change docker image to one including sparrowhawk

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Print warning instead exception (#97)

* raise text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* text arg

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* Failed text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* add logger

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* logger

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* NeMo-text-processing

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* info level

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Normalizer.select_verbalizer

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Exception

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restart ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nikolay Karpov <nkarpov@nvidia.com>
Co-authored-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* warning regardless of verbose flag (#107)

* warning

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* self.verbose

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Unpin setuptools (#106)

Signed-off-by: Peter Plantinga <plantinga.peter@proton.me>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed warnings: File is not always closes. (#113)

Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug #111 (ar currencies) (#117)

* fix bug #111 (ar currencies)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci folder

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Logging clean up + IT TN fix (#118)

* fix utils and it TN

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* clean up

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix logging

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add IT TN to CI

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update patch

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Time_IT_TN (#105)

* add time verbalizer

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add time tagger and verba

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add pytest time

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeQL

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix numbers with eight

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* IT TN improvement on tests (#120)

* add missing test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with time tests

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add sentence test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine shortest path for irregular cardinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add single letter exception for roman numerals (#121)

* add single letter exception for roman numerals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci dir

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrote tokenizer

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed the file and replaced it with char in 1.8

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkins file update

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to fix tn bug@ xuesong

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* tn bug

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixeds and updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adjustments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* testing commit

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating etst cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates adapting to graphs

Signed-off-by: Alex Cui <al…
BuyuanCui added a commit that referenced this pull request Oct 16, 2024
* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add country codes from hu (#77)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix electronic case for username (#75)

* fix electronic username w/o .

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix ar test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci dirs, enable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.8 release (#79)

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Codeswitched ES/EN ITN  (#78)

* Initial commit for ES-EN codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable export for es_en codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add whitelist, update weights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests for en_es, zone tagged separately in es

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix path to test data for sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile - enable ES/EN tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add __init__.py files

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix issues with failed docker build - due to archiving of debian and issues with re2

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove unused imports and variables

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update date

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable NBSP in sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update copyrights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache path in for ES/EN CI/CD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* minor normalize.py edit for usability (#84)

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* documentation edits for grammar/clarity

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* added --output_field flag for command line interface

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish ITN (#40)

* force two digits for month

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put it in a function, because I reject the garbage pre-commit.ci came up with

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrap some more pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add graph pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete junk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser (copy from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add right tokens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks, more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to ITN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moved to tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nothing actually fixed here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* now most tests pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* electronic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bare fractions is a bit of an overreach

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just inverting the TN whitelist tagger will not work/be useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from English

Signed-off-by: Jim O'Regan <joregan@kth.se>

* overwrite with version from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swap tsv sides

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add optional_era variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make deterministic default, like most of the others

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also add lowercase versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replacing NEMO_SPACE does not work either

Signed-off-by: Jim O'Regan <joregan@kth.se>

* increasing weight... did not work last time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaking test cases, in case it was a sentence splitting issue. It was not

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put the full stops back

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add filler words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try splitting this out to see if it makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* aha, this part should be non-deterministic only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single line only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "increasing weight... did not work last time"

This reverts commit 39b020b50db745dfd6b281c8cbca45a033926996.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* disabling ITN here makes TN work again(?)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "disabling ITN here makes TN work again(?)"

This reverts commit be49d7d5c687876e51c2e9ce1cf1e01491df280f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing the variable name fixes norm tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing test tooling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a piece for area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add country codes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend any_read_digit for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country/area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* first attempt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove country codes for the time being, makes things ambiguous

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove trailing whitespace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update __init__.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transform of TN tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transformation of TN decimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* slight changes to date

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* include space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen was not that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* respect my own copyright xD

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading to constructor; had weirdness in this file, probably due to module-level python-suckage

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading, this has been an oddity before

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try changing this year declaration

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add year + era

Signed-off-by: Jim O'Regan <joregan@kth.se>

* eliminate more module-level data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "eliminate more module-level data loading"

This reverts commit 6a26e5d927817e1308e818758196924441ff7b3a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expose variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extra param for itn mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix parens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adapt/extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix dict init/change keys to something useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* initial stab at prefixed numbers

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert kl. if absent

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* the relative prefixed times

Signed-off-by: Jim O'Regan <joregan@kth.se>

* + comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* enable time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space in both directions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix hours to

Signed-off-by: Jim O'Regan <joregan@kth.se>

* split by before/after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix if

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. 9

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* keep only get_abs_path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add trimmed file

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* two abs_paths... could be fun

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minutes/seconds

Signed-off-by: Jim O'Regan <joregan@kth.se>

* suffix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* one optional

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. or one of suffix/zone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* already disambiguated

Signed-off-by: Jim O'Regan <joregan@kth.se>

* closure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do not insert kl.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix spelling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Delete measure.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete money.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused test pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add/update __init__

Signed-off-by: Jim O'Regan <joregan@kth.se>

* blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix lang

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove year, conflicts with cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space before, not after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cardinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* spurious deletion

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* re-enable SV TN; enable SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "re-enable SV TN; enable SV ITN"

This reverts commit 3ce4dfde1f70a89afc274284f6e4c737b3fac95b.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix singulras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use cdrewrite

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just EOS/BOS

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* omit en/ett, because they are also articles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* strip spaces from decimal part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partial fix, not what I wanted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* en/ett cannot work in itn case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* be more deliberate in graph construction

Signed-off-by: Jim O'Regan <joregan@kth.se>

* accept both

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* +2 tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* (try to) accept singular quantities for plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* retry

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* oops

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* arcmap

Signed-off-by: Jim O'Regan <joregan@kth.se>

* version without ones

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of this, this is where it goes wrong

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* multiple states from both ones, try removing and readding

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove ones, see if that fixes at least the bare quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, dunno why it still breaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move definition

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* local declaration, seems to not be working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* match verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix last two failing tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing tests for telephone and word

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of convert_space, tests fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "put convert_spaces back, change test file; pytest fails"

This reverts commit a7bb7489137b8026aab02aff64df39e874630043.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails, take 2

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deliberately remove spaces rather than have a non-determinism that comes out differently in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try converting the non-breaking spaces in the shell script

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrong place

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update date.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update time.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix comment

Signed-off-by: Jim O’Regan <joregan@kth.se>

* trim comments

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove commented line

Signed-off-by: Jim O’Regan <joregan@kth.se>

* en halv

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update test_sparrowhawk_inverse_text_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Italian_TN (#67)

* add TN italian

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix init

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix LOCATION

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* modify graph_utils

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* correct decimals

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix measure

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh itn (#74)

* Add ZH ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix copyrights and code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove invalid tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve CodeQL issues

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix missing 'zh' option for ITN and correct comment

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update __init__.py

Change to zh instead of en for the imports.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for decimal test data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* update for langauge import

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for Chinese punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* a new class for whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* PYNINI_AVAILABLE = False

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to file import format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* caught duplicates, removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates, arranges for CHInese Yuan updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates accordingly to the comments from last PR. Recreated some of the files due to format issues

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the hours_to and minute_to files used for back counting. ALso removed am and pm suffix files according to the last PR. Recreated some of them for format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* re-added this file to avoid data file import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated gramamr according to last PR. Removed the acceptance of 千

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated according to last PR. Removed comma after decimal points

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for Fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for money and updated according to last PR. Plus process of 元

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar. updates due to the updates in cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR comments. removing am and pm and allowing simple mandarin expression

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* arrangements

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added whitelist grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* word grammar for non-classified items

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated cardinal, decimal, time, itn data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to last PR

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to the updates for cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for more Mandarin punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR. removing am pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* adjustment on the weight

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the targger updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the time tagger

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to changes in tagger on am and pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* verbalizer for fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for mandarin grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* kept this file because using English utils results in data namin error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* merge conflict

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed import os

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed variables

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue, reccreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue recreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed codeing style/format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed coding style and format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicated graph for 毛

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unnecessary comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* unnecessary comment removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test file updated for more cases

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated with a comment explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated the file explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added Mandarin as zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing for dplication

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed unused NEMO objects

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix test file failures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix file failtures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failture

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adap to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adapt to grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adopt to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixing pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed // for zhtn/itn cache

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Update inverse_normalize.py

Added zh as a selection to pass Jenkins checks.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: Alex Cui <alcui@nvidia.com>
Co-authored-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated pynini_export.py file to create far files (#88)

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* readd Swedish (#87)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn 0712 (#89)

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and fixings according to document on natonal gideline

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fraction updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* money updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* punctuation grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* time gramamr updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* tokenizaer updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates on certificate

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* data updated and added due to updates and chanegs to the existing grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* cardinal updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* date grammar changed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test data added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test python file edits

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for tn1.0 and previous tn grammar from contribution

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test cases updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fixed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* dates updated for init files

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated the date for zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added back the itn tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added back measure and math from previou TN

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated for tests reruns

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated weights

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn char (#95)

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* code stle

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* audio-based TN fix for empty pred_text/text (#92)

* fix for empty pred_text

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add unittests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix pytest

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* pip 1.2.0

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* French tn (#91)

* add tests for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add fr tn for cardinals, decimals, fractions and ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* delete it far files from tools

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add languages to run_evaluate

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* remove ambiguous spacing

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* enable sh testing for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile cache date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix test for ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update tn cache for fr

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* resolve codeql issues

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist_tech.tsv (#96)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zhitn 0727 (#93)

* updates on itn grammar to pass sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats for sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates fro sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fix

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for coding style and sparrowhawk test

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated classes for tests on whitelist and word grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for tests on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_word.py

Removed unused import.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_word.py

Removed imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Removing imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Jenkinsfile

changed zh cache to 07-27-23 as it is the latest update.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es tn romans fix (#98)

* fix es tn roman exceptions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update eval script for ITN

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* codeql fix

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Change docker image (#102)

Change docker image to one including sparrowhawk

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Print warning instead exception (#97)

* raise text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* text arg

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* Failed text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* add logger

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* logger

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* NeMo-text-processing

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* info level

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Normalizer.select_verbalizer

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Exception

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restart ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nikolay Karpov <nkarpov@nvidia.com>
Co-authored-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* warning regardless of verbose flag (#107)

* warning

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* self.verbose

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Unpin setuptools (#106)

Signed-off-by: Peter Plantinga <plantinga.peter@proton.me>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed warnings: File is not always closes. (#113)

Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug #111 (ar currencies) (#117)

* fix bug #111 (ar currencies)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci folder

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Logging clean up + IT TN fix (#118)

* fix utils and it TN

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* clean up

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix logging

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add IT TN to CI

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update patch

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Time_IT_TN (#105)

* add time verbalizer

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add time tagger and verba

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add pytest time

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeQL

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix numbers with eight

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* IT TN improvement on tests (#120)

* add missing test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with time tests

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add sentence test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine shortest path for irregular cardinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add single letter exception for roman numerals (#121)

* add single letter exception for roman numerals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci dir

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrote tokenizer

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed the file and replaced it with char in 1.8

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkins file update

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to fix tn bug@ xuesong

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* tn bug

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixeds and updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adjustments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* testing commit

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating etst cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates adapting to graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated cases for SH tests

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added some sentences

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test cases update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* solving rebase issue, repushing changes

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixings according to ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixings according to the ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* notused removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* format issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* formt issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unused files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* remiving unsed files;

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added sentences as test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added senetnces as test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed commentyed out tests

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating dates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* attemps to fix bug

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* inprocess of fixing the bug

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing existing issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated graph_utils, tokenize and classify, and word graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added bacl the ppostprocessor far creation

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated NEMO_NOT_ALPHA as a new variable

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* far files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* combiedn into measure

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing and combined to meaasure

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing, not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to solve the space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving sh issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving sh test issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding anands updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* data updated for measure and whitelist

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing fraction and math part

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing preprocessor, updating measure, adding shitelist cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing processor, modification for sp test, shitelist and word

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating zh date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* realized itn being cvommented out, adding back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* trying to run zh tn separately because it takes long time to run

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* modification to ru zh tn separately

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* independent zh tnitn tests for more time

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding lines to save far file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates for reducing testing time

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* for ounct graph

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing used graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* format and removing used comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing this one, not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* remove unused commentss�

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Delete tools/text_processing_deployment/zh directory

Removing far files.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* updates according to the github comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* punct grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_cases_cardinal.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Dockerfile

Copied from main branch ( which included Anand's updates)

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update launch.sh

Found differences in the file. Fixing it back.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_word.py

Saw word ITN being commented out. Adding it back.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update money.py

Found cardinal grammar not accepting suffix. Fixed it.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update Jenkinsfile

Removed duplicated zh test from line 230s

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update utils.py

Addressing bug raised in bug in graph_utils.py of zh ITN and decimal tagger of ar TN #162.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update graph_utils.py

Addressing bug in graph_utils.py of zh ITN and decimal tagger of ar TN #162.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update measure.py

Fixing code style, removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Fixing code style, removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update measure.py

Removing unused import.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update post_processing.py

Removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update post_processing.py

Removing unused import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update cardinal.py

Deleting unused graph

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Removing import pynini

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

removing pynini import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update verbalize.py

removing pynutil import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update post_processing.py

removing punct graph imported

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_sparrowhawk_normalization.sh

Update on test issue for Docker file locations

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_ordinal.py

Fixing style.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Delete nemo_text_processing/text_normalization/zh/taggers/math_symbol.py

Removing because it's not one of the semiotic classes.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Delete nemo_text_processing/text_normalization/zh/verbalizers/math_symbol.py

Removing because it's not one of the semiotic classes.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Jenkinsfile

Updating Jenkins date

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>
Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Peter Plantinga <plantinga.peter@proton.me>
Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Co-authored-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: Enno Hermann <Eginhard@users.noreply.github.com>
Co-authored-by: Vitaly Lavrukhin <vitaly.lavrukhin@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Enas Albasiri <71229149+ealbasiri@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: lleaver <137942999+lleaver@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: Giacomo Leone Maria Cavallini <72698188+GiacomoLeoneMaria@users.noreply.github.com>
Co-authored-by: Alex Cui <alcui@nvidia.com>
Co-authored-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: Nikolay Karpov <nkarpov@nvidia.com>
Co-authored-by: Peter Plantinga <plantinga.peter@proton.me>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>
BuyuanCui added a commit that referenced this pull request Oct 16, 2024
* IT TN improvement on tests (#120)

* add missing test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with time tests

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add sentence test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine shortest path for irregular cardinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add single letter exception for roman numerals (#121)

* add single letter exception for roman numerals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci dir

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix broken path for nondet whitelist (#124)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Increase weights for serial (en TN) (#128)

* Increase weights for serial (en TN)

Resolves https://github.com/NVIDIA/NeMo-text-processing/issues/126

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Add tests for fix

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Update Jenkinsfile cache path

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Update Jenkinsfile. Fix cache folder

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measures file for FR TN (#131)

* add measures file

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* update whitelist data

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* add fr tn tests

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Sh jenkins (#127)

* Add SH tests to Jenkins

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkins tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add CI/CD tests for sparrowhawk

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* docker build only if in test mode

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix missing variable

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix comments and remove arguments not required

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix commands not executing

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Missing arguments

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Missing quotes

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix incorrect path for tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Incorrect paths of tests and shunit2

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix issues with paths as arguments to shunit

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Undo path change

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix intentional fail test

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* revert redundant check for cased option

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix default path in export_grammars.sh

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add interactive option

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add SH tests for cased EN ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update isort - fix precommit (#138)

* update isort version

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update isort version

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused imports

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Armenian itn (#136)

* Added Armenian ITN

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added Armenian ITN

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added Armenian ITN

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added context for tests and fixed CodeQL errors

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Revert "Added context for tests and fixed CodeQL errors"

This reverts commit 2c804d941963c0be21d3aad07e6cd13568ab747b.

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added context to some test files and fixed CodeQL errors

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* deleted unnecessary data

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* translated a few measurements to Armenian

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* adjusted some things for better readability and maintainer support

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed one test case and some issues

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>
Co-authored-by: David Sargsyan <d.sargsyan@ispras.ru>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix CI (#142)

* fix whitelist deployment

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* clean up

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comment out tests to recreate grammars

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* shorten test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* cased for TN

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* revert debug changes

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix args default

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* try parallel

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* debug parallel

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* rerun

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* rerun

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix sh tests for local SH launcher

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* enable all ci tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* enable all ci tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Armenian TN (#137)

* merged with main branch and fixed conflicts

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* fixing conflicts

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* fixing some more conflicts

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* fixed a minor issue

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* deleted unused imports

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix: add "hy" language option for armenian

Signed-off-by: Ara Yeroyan <60027241+Ara-Yeroyan@users.noreply.github.com>

* added optional space for measurements after cardinals/decimals

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* added Armenian dot

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>
Signed-off-by: Ara Yeroyan <60027241+Ara-Yeroyan@users.noreply.github.com>
Signed-off-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Co-authored-by: David Sargsyan <d.sargsyan@ispras.ru>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Ara Yeroyan <60027241+Ara-Yeroyan@users.noreply.github.com>
Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Marathi ITN (#134)

* Added Marathi ITN

Signed-off-by: Chinmay Patil <chinmaypatil2000@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adding jenkins test

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

---------

Signed-off-by: Chinmay Patil <chinmaypatil2000@gmail.com>
Signed-off-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Signed-off-by: Travis Bartley <tbartley@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Co-authored-by: Travis Bartley <tbartley@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkins fix (#150)

* jenkins fix

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* removing armenian to troubleshoot jenkins

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* removing armenian to troubleshoot jenkins

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* missing _init_ for python

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* mislabled cache

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

---------

Signed-off-by: Travis Bartley <tbartley@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* r0.3.0 release (#151)

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix text=line[text] to text=line[text_field] (#153)

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* use real string on docstring (#157)

Signed-off-by: Kevin Sanders <kevin.sanders@dialpad.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Sh postprocess (#147)

* Add support for postprocessor far in sparrowhawk

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Choose between having a post processor or not

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update run_evaluate script for cased itn (#164)

* update run_evaluate script for cased itn

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* remove unused function from ar tn decimals (#165)

* remove unused function from ar tn decimals

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* ZH sentence-level TN (#112)

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add country codes from hu (#77)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix electronic case for username (#75)

* fix electronic username w/o .

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix ar test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci dirs, enable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.8 release (#79)

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Codeswitched ES/EN ITN  (#78)

* Initial commit for ES-EN codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable export for es_en codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add whitelist, update weights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests for en_es, zone tagged separately in es

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix path to test data for sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile - enable ES/EN tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add __init__.py files

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix issues with failed docker build - due to archiving of debian and issues with re2

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove unused imports and variables

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update date

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable NBSP in sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update copyrights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache path in for ES/EN CI/CD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* minor normalize.py edit for usability (#84)

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* documentation edits for grammar/clarity

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* added --output_field flag for command line interface

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish ITN (#40)

* force two digits for month

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put it in a function, because I reject the garbage pre-commit.ci came up with

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrap some more pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add graph pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete junk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser (copy from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add right tokens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks, more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to ITN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moved to tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nothing actually fixed here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* now most tests pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* electronic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bare fractions is a bit of an overreach

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just inverting the TN whitelist tagger will not work/be useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from English

Signed-off-by: Jim O'Regan <joregan@kth.se>

* overwrite with version from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swap tsv sides

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add optional_era variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make deterministic default, like most of the others

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also add lowercase versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replacing NEMO_SPACE does not work either

Signed-off-by: Jim O'Regan <joregan@kth.se>

* increasing weight... did not work last time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaking test cases, in case it was a sentence splitting issue. It was not

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put the full stops back

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add filler words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try splitting this out to see if it makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* aha, this part should be non-deterministic only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single line only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "increasing weight... did not work last time"

This reverts commit 39b020b50db745dfd6b281c8cbca45a033926996.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* disabling ITN here makes TN work again(?)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "disabling ITN here makes TN work again(?)"

This reverts commit be49d7d5c687876e51c2e9ce1cf1e01491df280f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing the variable name fixes norm tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing test tooling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a piece for area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add country codes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend any_read_digit for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country/area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* first attempt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove country codes for the time being, makes things ambiguous

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove trailing whitespace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update __init__.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transform of TN tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transformation of TN decimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* slight changes to date

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* include space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen was not that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* respect my own copyright xD

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading to constructor; had weirdness in this file, probably due to module-level python-suckage

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading, this has been an oddity before

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try changing this year declaration

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add year + era

Signed-off-by: Jim O'Regan <joregan@kth.se>

* eliminate more module-level data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "eliminate more module-level data loading"

This reverts commit 6a26e5d927817e1308e818758196924441ff7b3a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expose variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extra param for itn mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix parens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adapt/extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix dict init/change keys to something useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* initial stab at prefixed numbers

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert kl. if absent

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* the relative prefixed times

Signed-off-by: Jim O'Regan <joregan@kth.se>

* + comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* enable time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space in both directions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix hours to

Signed-off-by: Jim O'Regan <joregan@kth.se>

* split by before/after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix if

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. 9

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* keep only get_abs_path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add trimmed file

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* two abs_paths... could be fun

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minutes/seconds

Signed-off-by: Jim O'Regan <joregan@kth.se>

* suffix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* one optional

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. or one of suffix/zone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* already disambiguated

Signed-off-by: Jim O'Regan <joregan@kth.se>

* closure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do not insert kl.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix spelling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Delete measure.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete money.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused test pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add/update __init__

Signed-off-by: Jim O'Regan <joregan@kth.se>

* blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix lang

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove year, conflicts with cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space before, not after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cardinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* spurious deletion

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* re-enable SV TN; enable SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "re-enable SV TN; enable SV ITN"

This reverts commit 3ce4dfde1f70a89afc274284f6e4c737b3fac95b.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix singulras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use cdrewrite

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just EOS/BOS

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* omit en/ett, because they are also articles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* strip spaces from decimal part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partial fix, not what I wanted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* en/ett cannot work in itn case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* be more deliberate in graph construction

Signed-off-by: Jim O'Regan <joregan@kth.se>

* accept both

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* +2 tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* (try to) accept singular quantities for plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* retry

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* oops

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* arcmap

Signed-off-by: Jim O'Regan <joregan@kth.se>

* version without ones

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of this, this is where it goes wrong

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* multiple states from both ones, try removing and readding

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove ones, see if that fixes at least the bare quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, dunno why it still breaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move definition

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* local declaration, seems to not be working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* match verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix last two failing tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing tests for telephone and word

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of convert_space, tests fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "put convert_spaces back, change test file; pytest fails"

This reverts commit a7bb7489137b8026aab02aff64df39e874630043.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails, take 2

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deliberately remove spaces rather than have a non-determinism that comes out differently in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try converting the non-breaking spaces in the shell script

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrong place

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update date.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update time.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix comment

Signed-off-by: Jim O’Regan <joregan@kth.se>

* trim comments

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove commented line

Signed-off-by: Jim O’Regan <joregan@kth.se>

* en halv

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update test_sparrowhawk_inverse_text_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Italian_TN (#67)

* add TN italian

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix init

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix LOCATION

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* modify graph_utils

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* correct decimals

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix measure

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh itn (#74)

* Add ZH ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix copyrights and code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove invalid tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve CodeQL issues

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix missing 'zh' option for ITN and correct comment

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update __init__.py

Change to zh instead of en for the imports.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for decimal test data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* update for langauge import

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for Chinese punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* a new class for whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* PYNINI_AVAILABLE = False

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to file import format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* caught duplicates, removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates, arranges for CHInese Yuan updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates accordingly to the comments from last PR. Recreated some of the files due to format issues

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the hours_to and minute_to files used for back counting. ALso removed am and pm suffix files according to the last PR. Recreated some of them for format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* re-added this file to avoid data file import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated gramamr according to last PR. Removed the acceptance of 千

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated according to last PR. Removed comma after decimal points

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for Fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for money and updated according to last PR. Plus process of 元

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar. updates due to the updates in cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR comments. removing am and pm and allowing simple mandarin expression

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* arrangements

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added whitelist grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* word grammar for non-classified items

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated cardinal, decimal, time, itn data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to last PR

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to the updates for cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for more Mandarin punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR. removing am pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* adjustment on the weight

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the targger updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the time tagger

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to changes in tagger on am and pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* verbalizer for fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for mandarin grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* kept this file because using English utils results in data namin error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* merge conflict

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed import os

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed variables

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue, reccreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue recreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed codeing style/format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed coding style and format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicated graph for 毛

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unnecessary comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* unnecessary comment removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test file updated for more cases

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated with a comment explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated the file explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added Mandarin as zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing for dplication

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed unused NEMO objects

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix test file failures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix file failtures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failture

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adap to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adapt to grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adopt to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixing pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed // for zhtn/itn cache

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Update inverse_normalize.py

Added zh as a selection to pass Jenkins checks.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: Alex Cui <alcui@nvidia.com>
Co-authored-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated pynini_export.py file to create far files (#88)

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* readd Swedish (#87)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn 0712 (#89)

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and fixings according to document on natonal gideline

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fraction updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* money updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* punctuation grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* time gramamr updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* tokenizaer updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates on certificate

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* data updated and added due to updates and chanegs to the existing grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* cardinal updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* date grammar changed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test data added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test python file edits

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for tn1.0 and previous tn grammar from contribution

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test cases updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fixed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* dates updated for init files

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated the date for zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added back the itn tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added back measure and math from previou TN

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated for tests reruns

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated weights

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn char (#95)

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* code stle

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* audio-based TN fix for empty pred_text/text (#92)

* fix for empty pred_text

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add unittests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix pytest

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* pip 1.2.0

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* French tn (#91)

* add tests for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add fr tn for cardinals, decimals, fractions and ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* delete it far files from tools

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add languages to run_evaluate

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* remove ambiguous spacing

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* enable sh testing for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile cache date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix test for ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update tn cache for fr

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* resolve codeql issues

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist_tech.tsv (#96)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zhitn 0727 (#93)

* updates on itn grammar to pass sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats for sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates fro sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fix

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for coding style and sparrowhawk test

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated classes for tests on whitelist and word grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for tests on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_word.py

Removed unused import.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_word.py

Removed imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Removing imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Jenkinsfile

changed zh cache to 07-27-23 as it is the latest update.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es tn romans fix (#98)

* fix es tn roman exceptions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update eval script for ITN

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* codeql fix

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Change docker image (#102)

Change docker image to one including sparrowhawk

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Print warning instead exception (#97)

* raise text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* text arg

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* Failed text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* add logger

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* logger

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* NeMo-text-processing

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* info level

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Normalizer.select_verbalizer

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Exception

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restart ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nikolay Karpov <nkarpov@nvidia.com>
Co-authored-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* warning regardless of verbose flag (#107)

* warning

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* self.verbose

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Unpin setuptools (#106)

Signed-off-by: Peter Plantinga <plantinga.peter@proton.me>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed warnings: File is not always closes. (#113)

Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug #111 (ar currencies) (#117)

* fix bug #111 (ar currencies)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci folder

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Logging clean up + IT TN fix (#118)

* fix utils and it TN

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* clean up

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix logging

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add IT TN to CI

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update patch

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Time_IT_TN (#105)

* add time verbalizer

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add time tagger and verba

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add pytest time

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeQL

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix numbers with eight

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* IT TN improvement on tests (#120)

* add missing test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with time tests

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add sentence test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine shortest path for irregular cardinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add single letter exception for roman numerals (#121)

* add single letter exception for roman numerals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci dir

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrote tokenizer

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed the file and replaced it with char in 1.8

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkins file update

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to fix tn bug@ xuesong

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* tn bug

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixeds and updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adjustments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* testing commit

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating etst cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates adapting to graphs

Signed-off-by: Alex Cui <al…
ankitnv pushed a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 24, 2024
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal changes will change back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updats on Jenkins

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkinspdate

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding one more test item

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changed regular space to narrow space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports error fixing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports errors

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Jekins update for jp itn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* reverting

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixng style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jp tn date update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* removing previously created nemo imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test order arrangement

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolve fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* empty file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to delete

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add jenkins file (#23)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add // to symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix language

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add usd$

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert "komma"

Signed-off-by: Jim O'Regan <joregan@kth.se>

* "pund" is neuter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* towards proper graphs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* GBP

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make komma non-det

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more money tagger fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix examples in comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* load labels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* right first time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missing space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year in test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a (failing) test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also handle decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed wrapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* telephone tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try adding more brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single digit

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, this seems to work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <joregan@kth.se>

* decimal tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* lower case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* greek letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* |=, not =

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables to check

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try doing this here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix errors in tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <joregan@kth.se>

* merge different tsvs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables for testing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek letters in maths

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor sg/pl

Signed-off-by: Jim O'Regan <joregan@kth.se>

* dedup

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put these under if, too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no; there are no minor neuters, so that is not relevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* here is one error

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export a variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a tesst case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add yen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* final fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove English roman tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove some unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <joregan@kth.se>

* address codeql comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <joregan@kth.se>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove broken duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* time tests now pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add swedish

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix here also

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a date case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplication

Signed-off-by: Jim O'Regan <joregan@kth.se>

* boost n_tagged

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also copyright this year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* days of the week

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci _cr

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* revert setup tool

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* electronic pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove unused imports

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add deterministic option normalized options

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins grammar folder

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up, update for SH

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* reduce cardinal graph

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add weight for sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix stage

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Change cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add whitelist to export

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update docstrings

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix for measures

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

---------

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run language tests in stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update DE cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix telephone, ordinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update electronic

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review feedback, update whitelist

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* rename capitalize func

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix SH tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins folder name

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* added cased arg to ITN

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add input_case arg to other lang

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dirs update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix codeql errors

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder for EN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* extend alignment for itn

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added test to pr doc

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix sv tests (#52)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.7 release (#53)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* superscript to superessive

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal time test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* will want cardinal here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move two letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* other ways of reading w

Signed-off-by: Jim O'Regan <joregan@kth.se>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* currency

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more inflection

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* working now, add a comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also accept the full words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deduplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* duplicate space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cache dir

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add components for read digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example with a different separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* start adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in progress

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nominal digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add IP prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on telephone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt more

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nearly there

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an IP test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* filter ordinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic fraction tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <joregan@kth.se>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test, including spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test for that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swapping order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more swapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some things fixed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more adjustments to time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq

Signed-off-by: Jim O'Regan <joregan@kth.se>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix measure cardinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

* Comment line, for now

Signed-off-by: Jim O’Regan <joregan@kth.se>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* see if this makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <joregan@kth.se>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at least it fails quickly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export original

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move things around for no real reason

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try this again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <joregan@kth.se>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rearrange slightly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853…
ankitnv pushed a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal changes will change back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updats on Jenkins

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkinspdate

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding one more test item

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changed regular space to narrow space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports error fixing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports errors

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Jekins update for jp itn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* reverting

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixng style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jp tn date update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* removing previously created nemo imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test order arrangement

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolve fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* empty file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to delete

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add jenkins file (#23)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add // to symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix language

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add usd$

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert "komma"

Signed-off-by: Jim O'Regan <joregan@kth.se>

* "pund" is neuter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* towards proper graphs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* GBP

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make komma non-det

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more money tagger fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix examples in comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* load labels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* right first time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missing space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year in test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a (failing) test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also handle decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed wrapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* telephone tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try adding more brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single digit

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, this seems to work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <joregan@kth.se>

* decimal tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* lower case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* greek letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* |=, not =

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables to check

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try doing this here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix errors in tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <joregan@kth.se>

* merge different tsvs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables for testing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek letters in maths

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor sg/pl

Signed-off-by: Jim O'Regan <joregan@kth.se>

* dedup

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put these under if, too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no; there are no minor neuters, so that is not relevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* here is one error

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export a variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a tesst case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add yen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* final fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove English roman tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove some unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <joregan@kth.se>

* address codeql comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <joregan@kth.se>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove broken duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* time tests now pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add swedish

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix here also

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a date case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplication

Signed-off-by: Jim O'Regan <joregan@kth.se>

* boost n_tagged

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also copyright this year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* days of the week

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci _cr

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* revert setup tool

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* electronic pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove unused imports

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add deterministic option normalized options

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins grammar folder

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up, update for SH

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* reduce cardinal graph

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add weight for sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix stage

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Change cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add whitelist to export

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update docstrings

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix for measures

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

---------

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run language tests in stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update DE cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix telephone, ordinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update electronic

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review feedback, update whitelist

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* rename capitalize func

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix SH tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins folder name

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* added cased arg to ITN

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add input_case arg to other lang

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dirs update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix codeql errors

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder for EN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* extend alignment for itn

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added test to pr doc

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix sv tests (#52)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.7 release (#53)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* superscript to superessive

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal time test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* will want cardinal here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move two letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* other ways of reading w

Signed-off-by: Jim O'Regan <joregan@kth.se>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* currency

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more inflection

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* working now, add a comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also accept the full words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deduplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* duplicate space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cache dir

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add components for read digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example with a different separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* start adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in progress

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nominal digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add IP prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on telephone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt more

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nearly there

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an IP test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* filter ordinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic fraction tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <joregan@kth.se>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test, including spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test for that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swapping order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more swapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some things fixed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more adjustments to time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq

Signed-off-by: Jim O'Regan <joregan@kth.se>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix measure cardinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

* Comment line, for now

Signed-off-by: Jim O’Regan <joregan@kth.se>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* see if this makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <joregan@kth.se>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at least it fails quickly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export original

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move things around for no real reason

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try this again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <joregan@kth.se>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rearrange slightly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853…
ankitnv pushed a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal changes will change back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updats on Jenkins

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkinspdate

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding one more test item

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changed regular space to narrow space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports error fixing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports errors

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Jekins update for jp itn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* reverting

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixng style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jp tn date update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* removing previously created nemo imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test order arrangement

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolve fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* empty file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to delete

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add jenkins file (#23)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add // to symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix language

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add usd$

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert "komma"

Signed-off-by: Jim O'Regan <joregan@kth.se>

* "pund" is neuter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* towards proper graphs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* GBP

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make komma non-det

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more money tagger fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix examples in comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* load labels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* right first time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missing space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year in test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a (failing) test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also handle decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed wrapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* telephone tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try adding more brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single digit

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, this seems to work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <joregan@kth.se>

* decimal tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* lower case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* greek letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* |=, not =

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables to check

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try doing this here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix errors in tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <joregan@kth.se>

* merge different tsvs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables for testing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek letters in maths

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor sg/pl

Signed-off-by: Jim O'Regan <joregan@kth.se>

* dedup

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put these under if, too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no; there are no minor neuters, so that is not relevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* here is one error

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export a variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a tesst case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add yen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* final fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove English roman tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove some unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <joregan@kth.se>

* address codeql comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <joregan@kth.se>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove broken duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* time tests now pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add swedish

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix here also

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a date case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplication

Signed-off-by: Jim O'Regan <joregan@kth.se>

* boost n_tagged

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also copyright this year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* days of the week

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci _cr

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* revert setup tool

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* electronic pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove unused imports

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add deterministic option normalized options

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins grammar folder

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up, update for SH

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* reduce cardinal graph

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add weight for sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix stage

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Change cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add whitelist to export

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update docstrings

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix for measures

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

---------

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run language tests in stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update DE cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix telephone, ordinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update electronic

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review feedback, update whitelist

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* rename capitalize func

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix SH tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins folder name

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* added cased arg to ITN

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add input_case arg to other lang

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dirs update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix codeql errors

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder for EN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* extend alignment for itn

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added test to pr doc

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix sv tests (#52)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.7 release (#53)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* superscript to superessive

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal time test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* will want cardinal here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move two letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* other ways of reading w

Signed-off-by: Jim O'Regan <joregan@kth.se>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* currency

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more inflection

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* working now, add a comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also accept the full words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deduplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* duplicate space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cache dir

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add components for read digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example with a different separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* start adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in progress

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nominal digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add IP prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on telephone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt more

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nearly there

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an IP test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* filter ordinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic fraction tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <joregan@kth.se>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test, including spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test for that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swapping order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more swapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some things fixed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more adjustments to time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq

Signed-off-by: Jim O'Regan <joregan@kth.se>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix measure cardinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

* Comment line, for now

Signed-off-by: Jim O’Regan <joregan@kth.se>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* see if this makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <joregan@kth.se>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at least it fails quickly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export original

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move things around for no real reason

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try this again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <joregan@kth.se>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rearrange slightly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853…
ankitnv added a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal changes will change back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updats on Jenkins

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkinspdate

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding one more test item

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changed regular space to narrow space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports error fixing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports errors

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Jekins update for jp itn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* reverting

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixng style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jp tn date update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* removing previously created nemo imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test order arrangement

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolve fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* empty file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to delete

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add jenkins file (#23)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add // to symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix language

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add usd$

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert "komma"

Signed-off-by: Jim O'Regan <joregan@kth.se>

* "pund" is neuter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* towards proper graphs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* GBP

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make komma non-det

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more money tagger fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix examples in comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* load labels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* right first time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missing space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year in test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a (failing) test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also handle decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed wrapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* telephone tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try adding more brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single digit

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, this seems to work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <joregan@kth.se>

* decimal tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* lower case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* greek letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* |=, not =

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables to check

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try doing this here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix errors in tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <joregan@kth.se>

* merge different tsvs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables for testing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek letters in maths

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor sg/pl

Signed-off-by: Jim O'Regan <joregan@kth.se>

* dedup

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put these under if, too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no; there are no minor neuters, so that is not relevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* here is one error

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export a variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a tesst case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add yen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* final fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove English roman tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove some unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <joregan@kth.se>

* address codeql comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <joregan@kth.se>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove broken duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* time tests now pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add swedish

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix here also

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a date case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplication

Signed-off-by: Jim O'Regan <joregan@kth.se>

* boost n_tagged

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also copyright this year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* days of the week

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci _cr

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* revert setup tool

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* electronic pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove unused imports

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add deterministic option normalized options

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins grammar folder

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up, update for SH

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* reduce cardinal graph

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add weight for sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix stage

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Change cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add whitelist to export

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update docstrings

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix for measures

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

---------

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run language tests in stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update DE cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix telephone, ordinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update electronic

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review feedback, update whitelist

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* rename capitalize func

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix SH tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins folder name

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* added cased arg to ITN

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add input_case arg to other lang

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dirs update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix codeql errors

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder for EN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* extend alignment for itn

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added test to pr doc

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix sv tests (#52)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.7 release (#53)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* superscript to superessive

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal time test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* will want cardinal here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move two letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* other ways of reading w

Signed-off-by: Jim O'Regan <joregan@kth.se>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* currency

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more inflection

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* working now, add a comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also accept the full words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deduplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* duplicate space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cache dir

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add components for read digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example with a different separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* start adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in progress

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nominal digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add IP prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on telephone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt more

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nearly there

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an IP test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* filter ordinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic fraction tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <joregan@kth.se>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test, including spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test for that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swapping order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more swapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some things fixed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more adjustments to time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq

Signed-off-by: Jim O'Regan <joregan@kth.se>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix measure cardinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

* Comment line, for now

Signed-off-by: Jim O’Regan <joregan@kth.se>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* see if this makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <joregan@kth.se>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at least it fails quickly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export original

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move things around for no real reason

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try this again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <joregan@kth.se>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rearrange slightly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853…
ankitnv pushed a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal changes will change back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updats on Jenkins

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkinspdate

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding one more test item

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changed regular space to narrow space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports error fixing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports errors

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Jekins update for jp itn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* reverting

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixng style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jp tn date update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* removing previously created nemo imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test order arrangement

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolve fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* empty file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to delete

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add jenkins file (#23)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add // to symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix language

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add usd$

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert "komma"

Signed-off-by: Jim O'Regan <joregan@kth.se>

* "pund" is neuter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* towards proper graphs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* GBP

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make komma non-det

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more money tagger fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix examples in comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* load labels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* right first time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missing space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year in test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a (failing) test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also handle decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed wrapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* telephone tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try adding more brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single digit

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, this seems to work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <joregan@kth.se>

* decimal tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* lower case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* greek letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* |=, not =

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables to check

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try doing this here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix errors in tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <joregan@kth.se>

* merge different tsvs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables for testing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek letters in maths

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor sg/pl

Signed-off-by: Jim O'Regan <joregan@kth.se>

* dedup

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put these under if, too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no; there are no minor neuters, so that is not relevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* here is one error

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export a variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a tesst case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add yen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* final fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove English roman tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove some unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <joregan@kth.se>

* address codeql comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <joregan@kth.se>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove broken duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* time tests now pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add swedish

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix here also

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a date case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplication

Signed-off-by: Jim O'Regan <joregan@kth.se>

* boost n_tagged

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also copyright this year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* days of the week

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci _cr

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* revert setup tool

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* electronic pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove unused imports

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add deterministic option normalized options

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins grammar folder

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up, update for SH

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* reduce cardinal graph

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add weight for sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix stage

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Change cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add whitelist to export

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update docstrings

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix for measures

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

---------

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run language tests in stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update DE cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix telephone, ordinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update electronic

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review feedback, update whitelist

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* rename capitalize func

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix SH tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins folder name

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* added cased arg to ITN

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add input_case arg to other lang

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dirs update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix codeql errors

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder for EN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* extend alignment for itn

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added test to pr doc

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix sv tests (#52)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.7 release (#53)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* superscript to superessive

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal time test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* will want cardinal here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move two letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* other ways of reading w

Signed-off-by: Jim O'Regan <joregan@kth.se>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* currency

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more inflection

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* working now, add a comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also accept the full words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deduplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* duplicate space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cache dir

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add components for read digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example with a different separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* start adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in progress

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nominal digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add IP prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on telephone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt more

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nearly there

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an IP test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* filter ordinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic fraction tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <joregan@kth.se>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test, including spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test for that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swapping order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more swapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some things fixed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more adjustments to time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq

Signed-off-by: Jim O'Regan <joregan@kth.se>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix measure cardinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

* Comment line, for now

Signed-off-by: Jim O’Regan <joregan@kth.se>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* see if this makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <joregan@kth.se>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at least it fails quickly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export original

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move things around for no real reason

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try this again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <joregan@kth.se>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rearrange slightly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853…
ankitnv pushed a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal changes will change back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updats on Jenkins

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkinspdate

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding one more test item

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changed regular space to narrow space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports error fixing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports errors

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Jekins update for jp itn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* reverting

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixng style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jp tn date update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* removing previously created nemo imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test order arrangement

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolve fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* empty file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to delete

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add jenkins file (#23)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add // to symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix language

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add usd$

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert "komma"

Signed-off-by: Jim O'Regan <joregan@kth.se>

* "pund" is neuter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* towards proper graphs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* GBP

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make komma non-det

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more money tagger fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix examples in comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* load labels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* right first time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missing space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year in test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a (failing) test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also handle decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed wrapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* telephone tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try adding more brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single digit

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, this seems to work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <joregan@kth.se>

* decimal tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* lower case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* greek letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* |=, not =

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables to check

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try doing this here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix errors in tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <joregan@kth.se>

* merge different tsvs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables for testing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek letters in maths

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor sg/pl

Signed-off-by: Jim O'Regan <joregan@kth.se>

* dedup

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put these under if, too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no; there are no minor neuters, so that is not relevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* here is one error

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export a variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a tesst case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add yen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* final fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove English roman tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove some unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <joregan@kth.se>

* address codeql comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <joregan@kth.se>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove broken duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* time tests now pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add swedish

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix here also

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a date case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplication

Signed-off-by: Jim O'Regan <joregan@kth.se>

* boost n_tagged

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also copyright this year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* days of the week

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci _cr

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* revert setup tool

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* electronic pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove unused imports

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add deterministic option normalized options

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins grammar folder

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up, update for SH

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* reduce cardinal graph

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add weight for sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix stage

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Change cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add whitelist to export

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update docstrings

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix for measures

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

---------

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run language tests in stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update DE cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix telephone, ordinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update electronic

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review feedback, update whitelist

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* rename capitalize func

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix SH tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins folder name

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* added cased arg to ITN

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add input_case arg to other lang

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dirs update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix codeql errors

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder for EN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* extend alignment for itn

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added test to pr doc

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix sv tests (#52)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.7 release (#53)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* superscript to superessive

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal time test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* will want cardinal here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move two letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* other ways of reading w

Signed-off-by: Jim O'Regan <joregan@kth.se>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* currency

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more inflection

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* working now, add a comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also accept the full words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deduplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* duplicate space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cache dir

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add components for read digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example with a different separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* start adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in progress

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nominal digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add IP prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on telephone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt more

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nearly there

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an IP test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* filter ordinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic fraction tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <joregan@kth.se>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test, including spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test for that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swapping order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more swapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some things fixed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more adjustments to time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq

Signed-off-by: Jim O'Regan <joregan@kth.se>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix measure cardinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

* Comment line, for now

Signed-off-by: Jim O’Regan <joregan@kth.se>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* see if this makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <joregan@kth.se>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at least it fails quickly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export original

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move things around for no real reason

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try this again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <joregan@kth.se>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rearrange slightly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853…
ankitnv pushed a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal changes will change back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updats on Jenkins

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkinspdate

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding one more test item

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changed regular space to narrow space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports error fixing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports errors

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Jekins update for jp itn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* reverting

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixng style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jp tn date update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* removing previously created nemo imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test order arrangement

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolve fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* empty file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to delete

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add jenkins file (#23)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add // to symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix language

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add usd$

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert "komma"

Signed-off-by: Jim O'Regan <joregan@kth.se>

* "pund" is neuter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* towards proper graphs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* GBP

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make komma non-det

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more money tagger fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix examples in comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* load labels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* right first time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missing space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year in test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a (failing) test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also handle decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed wrapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* telephone tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try adding more brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single digit

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, this seems to work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <joregan@kth.se>

* decimal tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* lower case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* greek letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* |=, not =

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables to check

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try doing this here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix errors in tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <joregan@kth.se>

* merge different tsvs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables for testing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek letters in maths

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor sg/pl

Signed-off-by: Jim O'Regan <joregan@kth.se>

* dedup

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put these under if, too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no; there are no minor neuters, so that is not relevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* here is one error

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export a variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a tesst case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add yen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* final fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove English roman tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove some unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <joregan@kth.se>

* address codeql comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <joregan@kth.se>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove broken duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* time tests now pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add swedish

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix here also

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a date case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplication

Signed-off-by: Jim O'Regan <joregan@kth.se>

* boost n_tagged

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also copyright this year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* days of the week

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci _cr

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* revert setup tool

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* electronic pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove unused imports

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add deterministic option normalized options

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins grammar folder

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up, update for SH

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* reduce cardinal graph

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add weight for sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix stage

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Change cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add whitelist to export

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update docstrings

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix for measures

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

---------

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run language tests in stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update DE cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix telephone, ordinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update electronic

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review feedback, update whitelist

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* rename capitalize func

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix SH tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins folder name

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* added cased arg to ITN

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add input_case arg to other lang

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dirs update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix codeql errors

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder for EN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* extend alignment for itn

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added test to pr doc

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix sv tests (#52)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.7 release (#53)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* superscript to superessive

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal time test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* will want cardinal here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move two letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* other ways of reading w

Signed-off-by: Jim O'Regan <joregan@kth.se>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* currency

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more inflection

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* working now, add a comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also accept the full words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deduplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* duplicate space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cache dir

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add components for read digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example with a different separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* start adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in progress

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nominal digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add IP prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on telephone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt more

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nearly there

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an IP test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* filter ordinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic fraction tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <joregan@kth.se>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test, including spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test for that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swapping order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more swapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some things fixed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more adjustments to time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq

Signed-off-by: Jim O'Regan <joregan@kth.se>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix measure cardinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

* Comment line, for now

Signed-off-by: Jim O’Regan <joregan@kth.se>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* see if this makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <joregan@kth.se>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at least it fails quickly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export original

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move things around for no real reason

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try this again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <joregan@kth.se>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rearrange slightly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853…
ankitnv added a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal changes will change back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updats on Jenkins

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkinspdate

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding one more test item

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changed regular space to narrow space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports error fixing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports errors

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Jekins update for jp itn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* reverting

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixng style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jp tn date update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* removing previously created nemo imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test order arrangement

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolve fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* empty file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to delete

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add jenkins file (#23)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add // to symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix language

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add usd$

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert "komma"

Signed-off-by: Jim O'Regan <joregan@kth.se>

* "pund" is neuter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* towards proper graphs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* GBP

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make komma non-det

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more money tagger fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix examples in comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* load labels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* right first time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missing space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year in test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a (failing) test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also handle decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed wrapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* telephone tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try adding more brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single digit

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, this seems to work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <joregan@kth.se>

* decimal tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* lower case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* greek letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* |=, not =

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables to check

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try doing this here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix errors in tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <joregan@kth.se>

* merge different tsvs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables for testing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek letters in maths

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor sg/pl

Signed-off-by: Jim O'Regan <joregan@kth.se>

* dedup

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put these under if, too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no; there are no minor neuters, so that is not relevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* here is one error

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export a variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a tesst case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add yen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* final fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove English roman tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove some unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <joregan@kth.se>

* address codeql comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <joregan@kth.se>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove broken duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* time tests now pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add swedish

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix here also

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a date case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplication

Signed-off-by: Jim O'Regan <joregan@kth.se>

* boost n_tagged

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also copyright this year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* days of the week

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci _cr

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* revert setup tool

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* electronic pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove unused imports

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add deterministic option normalized options

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins grammar folder

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up, update for SH

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* reduce cardinal graph

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add weight for sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix stage

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Change cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add whitelist to export

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update docstrings

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix for measures

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

---------

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run language tests in stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update DE cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix telephone, ordinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update electronic

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review feedback, update whitelist

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* rename capitalize func

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix SH tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins folder name

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* added cased arg to ITN

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add input_case arg to other lang

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dirs update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix codeql errors

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder for EN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* extend alignment for itn

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added test to pr doc

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix sv tests (#52)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.7 release (#53)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* superscript to superessive

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal time test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* will want cardinal here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move two letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* other ways of reading w

Signed-off-by: Jim O'Regan <joregan@kth.se>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* currency

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more inflection

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* working now, add a comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also accept the full words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deduplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* duplicate space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cache dir

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add components for read digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example with a different separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* start adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in progress

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nominal digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add IP prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on telephone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt more

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nearly there

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an IP test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* filter ordinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic fraction tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <joregan@kth.se>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test, including spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test for that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swapping order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more swapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some things fixed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more adjustments to time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq

Signed-off-by: Jim O'Regan <joregan@kth.se>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix measure cardinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

* Comment line, for now

Signed-off-by: Jim O’Regan <joregan@kth.se>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* see if this makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <joregan@kth.se>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at least it fails quickly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export original

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move things around for no real reason

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try this again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <joregan@kth.se>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rearrange slightly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853…
ankitnv added a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal changes will change back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updats on Jenkins

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkinspdate

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding one more test item

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changed regular space to narrow space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports error fixing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports errors

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Jekins update for jp itn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* reverting

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixng style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jp tn date update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* removing previously created nemo imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test order arrangement

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolve fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* empty file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to delete

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add jenkins file (#23)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add // to symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix language

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add usd$

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert "komma"

Signed-off-by: Jim O'Regan <joregan@kth.se>

* "pund" is neuter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* towards proper graphs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* GBP

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make komma non-det

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more money tagger fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix examples in comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* load labels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* right first time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missing space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year in test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a (failing) test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also handle decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed wrapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* telephone tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try adding more brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single digit

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, this seems to work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <joregan@kth.se>

* decimal tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* lower case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* greek letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* |=, not =

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables to check

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try doing this here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix errors in tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <joregan@kth.se>

* merge different tsvs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables for testing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek letters in maths

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor sg/pl

Signed-off-by: Jim O'Regan <joregan@kth.se>

* dedup

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put these under if, too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no; there are no minor neuters, so that is not relevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* here is one error

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export a variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a tesst case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add yen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* final fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove English roman tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove some unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <joregan@kth.se>

* address codeql comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <joregan@kth.se>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove broken duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* time tests now pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add swedish

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix here also

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a date case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplication

Signed-off-by: Jim O'Regan <joregan@kth.se>

* boost n_tagged

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also copyright this year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* days of the week

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci _cr

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* revert setup tool

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* electronic pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove unused imports

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add deterministic option normalized options

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins grammar folder

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up, update for SH

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* reduce cardinal graph

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add weight for sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix stage

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Change cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add whitelist to export

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update docstrings

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix for measures

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

---------

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run language tests in stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update DE cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix telephone, ordinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update electronic

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review feedback, update whitelist

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* rename capitalize func

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix SH tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins folder name

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* added cased arg to ITN

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add input_case arg to other lang

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dirs update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix codeql errors

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder for EN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* extend alignment for itn

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added test to pr doc

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix sv tests (#52)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.7 release (#53)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* superscript to superessive

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal time test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* will want cardinal here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move two letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* other ways of reading w

Signed-off-by: Jim O'Regan <joregan@kth.se>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* currency

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more inflection

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* working now, add a comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also accept the full words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deduplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* duplicate space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cache dir

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add components for read digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example with a different separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* start adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in progress

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nominal digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add IP prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on telephone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt more

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nearly there

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an IP test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* filter ordinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic fraction tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <joregan@kth.se>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test, including spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test for that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swapping order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more swapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some things fixed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more adjustments to time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq

Signed-off-by: Jim O'Regan <joregan@kth.se>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix measure cardinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

* Comment line, for now

Signed-off-by: Jim O’Regan <joregan@kth.se>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* see if this makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <joregan@kth.se>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at least it fails quickly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export original

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move things around for no real reason

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try this again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <joregan@kth.se>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rearrange slightly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-c…
ankitnv added a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal changes will change back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updats on Jenkins

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkinspdate

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding one more test item

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changed regular space to narrow space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports error fixing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports errors

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Jekins update for jp itn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* reverting

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixng style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jp tn date update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* removing previously created nemo imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test order arrangement

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolve fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* empty file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to delete

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add jenkins file (#23)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add // to symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix language

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add usd$

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert "komma"

Signed-off-by: Jim O'Regan <joregan@kth.se>

* "pund" is neuter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* towards proper graphs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* GBP

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make komma non-det

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more money tagger fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix examples in comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* load labels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* right first time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missing space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year in test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a (failing) test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also handle decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed wrapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* telephone tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try adding more brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single digit

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, this seems to work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <joregan@kth.se>

* decimal tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* lower case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* greek letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* |=, not =

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables to check

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try doing this here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix errors in tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <joregan@kth.se>

* merge different tsvs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables for testing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek letters in maths

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor sg/pl

Signed-off-by: Jim O'Regan <joregan@kth.se>

* dedup

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put these under if, too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no; there are no minor neuters, so that is not relevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* here is one error

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export a variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a tesst case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add yen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* final fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove English roman tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove some unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <joregan@kth.se>

* address codeql comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <joregan@kth.se>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove broken duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* time tests now pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add swedish

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix here also

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a date case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplication

Signed-off-by: Jim O'Regan <joregan@kth.se>

* boost n_tagged

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also copyright this year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* days of the week

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci _cr

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* revert setup tool

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* electronic pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove unused imports

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add deterministic option normalized options

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins grammar folder

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up, update for SH

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* reduce cardinal graph

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add weight for sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix stage

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Change cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add whitelist to export

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update docstrings

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix for measures

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

---------

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run language tests in stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update DE cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix telephone, ordinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update electronic

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review feedback, update whitelist

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* rename capitalize func

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix SH tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins folder name

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* added cased arg to ITN

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add input_case arg to other lang

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dirs update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix codeql errors

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder for EN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* extend alignment for itn

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added test to pr doc

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix sv tests (#52)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.7 release (#53)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* superscript to superessive

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal time test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* will want cardinal here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move two letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* other ways of reading w

Signed-off-by: Jim O'Regan <joregan@kth.se>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* currency

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more inflection

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* working now, add a comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also accept the full words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deduplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* duplicate space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cache dir

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add components for read digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example with a different separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* start adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in progress

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nominal digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add IP prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on telephone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt more

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nearly there

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an IP test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* filter ordinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic fraction tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <joregan@kth.se>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test, including spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test for that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swapping order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more swapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some things fixed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more adjustments to time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq

Signed-off-by: Jim O'Regan <joregan@kth.se>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix measure cardinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

* Comment line, for now

Signed-off-by: Jim O’Regan <joregan@kth.se>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* see if this makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <joregan@kth.se>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at least it fails quickly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export original

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move things around for no real reason

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try this again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <joregan@kth.se>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rearrange slightly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-c…
ngachchi pushed a commit to ngachchi/NeMo-text-processing that referenced this pull request Jun 23, 2025
* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add country codes from hu (#77)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix electronic case for username (#75)

* fix electronic username w/o .

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix ar test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci dirs, enable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.8 release (#79)

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Codeswitched ES/EN ITN  (#78)

* Initial commit for ES-EN codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable export for es_en codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add whitelist, update weights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests for en_es, zone tagged separately in es

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix path to test data for sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile - enable ES/EN tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add __init__.py files

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix issues with failed docker build - due to archiving of debian and issues with re2

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove unused imports and variables

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update date

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable NBSP in sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update copyrights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache path in for ES/EN CI/CD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* minor normalize.py edit for usability (#84)

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* documentation edits for grammar/clarity

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* added --output_field flag for command line interface

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish ITN (#40)

* force two digits for month

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put it in a function, because I reject the garbage pre-commit.ci came up with

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrap some more pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add graph pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete junk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser (copy from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add right tokens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks, more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to ITN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moved to tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nothing actually fixed here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* now most tests pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* electronic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bare fractions is a bit of an overreach

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just inverting the TN whitelist tagger will not work/be useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from English

Signed-off-by: Jim O'Regan <joregan@kth.se>

* overwrite with version from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swap tsv sides

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add optional_era variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make deterministic default, like most of the others

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also add lowercase versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replacing NEMO_SPACE does not work either

Signed-off-by: Jim O'Regan <joregan@kth.se>

* increasing weight... did not work last time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaking test cases, in case it was a sentence splitting issue. It was not

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put the full stops back

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add filler words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try splitting this out to see if it makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* aha, this part should be non-deterministic only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single line only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "increasing weight... did not work last time"

This reverts commit 39b020b50db745dfd6b281c8cbca45a033926996.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* disabling ITN here makes TN work again(?)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "disabling ITN here makes TN work again(?)"

This reverts commit be49d7d5c687876e51c2e9ce1cf1e01491df280f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing the variable name fixes norm tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing test tooling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a piece for area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add country codes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend any_read_digit for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country/area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* first attempt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove country codes for the time being, makes things ambiguous

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove trailing whitespace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update __init__.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transform of TN tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transformation of TN decimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* slight changes to date

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* include space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen was not that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* respect my own copyright xD

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading to constructor; had weirdness in this file, probably due to module-level python-suckage

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading, this has been an oddity before

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try changing this year declaration

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add year + era

Signed-off-by: Jim O'Regan <joregan@kth.se>

* eliminate more module-level data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "eliminate more module-level data loading"

This reverts commit 6a26e5d927817e1308e818758196924441ff7b3a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expose variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extra param for itn mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix parens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adapt/extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix dict init/change keys to something useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* initial stab at prefixed numbers

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert kl. if absent

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* the relative prefixed times

Signed-off-by: Jim O'Regan <joregan@kth.se>

* + comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* enable time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space in both directions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix hours to

Signed-off-by: Jim O'Regan <joregan@kth.se>

* split by before/after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix if

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. 9

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* keep only get_abs_path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add trimmed file

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* two abs_paths... could be fun

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minutes/seconds

Signed-off-by: Jim O'Regan <joregan@kth.se>

* suffix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* one optional

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. or one of suffix/zone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* already disambiguated

Signed-off-by: Jim O'Regan <joregan@kth.se>

* closure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do not insert kl.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix spelling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Delete measure.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete money.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused test pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add/update __init__

Signed-off-by: Jim O'Regan <joregan@kth.se>

* blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix lang

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove year, conflicts with cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space before, not after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cardinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* spurious deletion

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* re-enable SV TN; enable SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "re-enable SV TN; enable SV ITN"

This reverts commit 3ce4dfde1f70a89afc274284f6e4c737b3fac95b.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix singulras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use cdrewrite

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just EOS/BOS

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* omit en/ett, because they are also articles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* strip spaces from decimal part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partial fix, not what I wanted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* en/ett cannot work in itn case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* be more deliberate in graph construction

Signed-off-by: Jim O'Regan <joregan@kth.se>

* accept both

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* +2 tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* (try to) accept singular quantities for plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* retry

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* oops

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* arcmap

Signed-off-by: Jim O'Regan <joregan@kth.se>

* version without ones

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of this, this is where it goes wrong

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* multiple states from both ones, try removing and readding

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove ones, see if that fixes at least the bare quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, dunno why it still breaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move definition

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* local declaration, seems to not be working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* match verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix last two failing tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing tests for telephone and word

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of convert_space, tests fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "put convert_spaces back, change test file; pytest fails"

This reverts commit a7bb7489137b8026aab02aff64df39e874630043.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails, take 2

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deliberately remove spaces rather than have a non-determinism that comes out differently in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try converting the non-breaking spaces in the shell script

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrong place

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update date.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update time.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix comment

Signed-off-by: Jim O’Regan <joregan@kth.se>

* trim comments

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove commented line

Signed-off-by: Jim O’Regan <joregan@kth.se>

* en halv

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update test_sparrowhawk_inverse_text_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Italian_TN (#67)

* add TN italian

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix init

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix LOCATION

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* modify graph_utils

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* correct decimals

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix measure

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh itn (#74)

* Add ZH ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix copyrights and code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove invalid tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve CodeQL issues

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix missing 'zh' option for ITN and correct comment

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update __init__.py

Change to zh instead of en for the imports.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for decimal test data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* update for langauge import

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for Chinese punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* a new class for whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* PYNINI_AVAILABLE = False

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to file import format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* caught duplicates, removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates, arranges for CHInese Yuan updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates accordingly to the comments from last PR. Recreated some of the files due to format issues

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the hours_to and minute_to files used for back counting. ALso removed am and pm suffix files according to the last PR. Recreated some of them for format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* re-added this file to avoid data file import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated gramamr according to last PR. Removed the acceptance of 千

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated according to last PR. Removed comma after decimal points

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for Fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for money and updated according to last PR. Plus process of 元

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar. updates due to the updates in cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR comments. removing am and pm and allowing simple mandarin expression

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* arrangements

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added whitelist grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* word grammar for non-classified items

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated cardinal, decimal, time, itn data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to last PR

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to the updates for cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for more Mandarin punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR. removing am pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* adjustment on the weight

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the targger updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the time tagger

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to changes in tagger on am and pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* verbalizer for fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for mandarin grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* kept this file because using English utils results in data namin error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* merge conflict

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed import os

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed variables

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue, reccreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue recreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed codeing style/format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed coding style and format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicated graph for 毛

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unnecessary comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* unnecessary comment removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test file updated for more cases

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated with a comment explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated the file explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added Mandarin as zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing for dplication

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed unused NEMO objects

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix test file failures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix file failtures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failture

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adap to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adapt to grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adopt to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixing pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed // for zhtn/itn cache

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Update inverse_normalize.py

Added zh as a selection to pass Jenkins checks.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: Alex Cui <alcui@nvidia.com>
Co-authored-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated pynini_export.py file to create far files (#88)

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* readd Swedish (#87)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn 0712 (#89)

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and fixings according to document on natonal gideline

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fraction updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* money updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* punctuation grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* time gramamr updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* tokenizaer updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates on certificate

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* data updated and added due to updates and chanegs to the existing grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* cardinal updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* date grammar changed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test data added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test python file edits

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for tn1.0 and previous tn grammar from contribution

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test cases updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fixed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* dates updated for init files

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated the date for zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added back the itn tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added back measure and math from previou TN

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated for tests reruns

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated weights

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn char (#95)

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* code stle

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* audio-based TN fix for empty pred_text/text (#92)

* fix for empty pred_text

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add unittests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix pytest

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* pip 1.2.0

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* French tn (#91)

* add tests for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add fr tn for cardinals, decimals, fractions and ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* delete it far files from tools

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add languages to run_evaluate

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* remove ambiguous spacing

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* enable sh testing for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile cache date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix test for ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update tn cache for fr

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* resolve codeql issues

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist_tech.tsv (#96)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zhitn 0727 (#93)

* updates on itn grammar to pass sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats for sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates fro sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fix

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for coding style and sparrowhawk test

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated classes for tests on whitelist and word grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for tests on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_word.py

Removed unused import.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_word.py

Removed imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Removing imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Jenkinsfile

changed zh cache to 07-27-23 as it is the latest update.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es tn romans fix (#98)

* fix es tn roman exceptions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update eval script for ITN

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* codeql fix

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Change docker image (#102)

Change docker image to one including sparrowhawk

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Print warning instead exception (#97)

* raise text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* text arg

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* Failed text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* add logger

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* logger

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* NeMo-text-processing

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* info level

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Normalizer.select_verbalizer

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Exception

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restart ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nikolay Karpov <nkarpov@nvidia.com>
Co-authored-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* warning regardless of verbose flag (#107)

* warning

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* self.verbose

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Unpin setuptools (#106)

Signed-off-by: Peter Plantinga <plantinga.peter@proton.me>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed warnings: File is not always closes. (#113)

Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug #111 (ar currencies) (#117)

* fix bug #111 (ar currencies)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci folder

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Logging clean up + IT TN fix (#118)

* fix utils and it TN

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* clean up

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix logging

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add IT TN to CI

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update patch

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Time_IT_TN (#105)

* add time verbalizer

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add time tagger and verba

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add pytest time

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeQL

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix numbers with eight

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* IT TN improvement on tests (#120)

* add missing test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with time tests

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add sentence test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine shortest path for irregular cardinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add single letter exception for roman numerals (#121)

* add single letter exception for roman numerals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci dir

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrote tokenizer

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed the file and replaced it with char in 1.8

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkins file update

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to fix tn bug@ xuesong

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* tn bug

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixeds and updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adjustments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* testing commit

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating etst cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates adapting to graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated cases for SH tests

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added some sentences

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test cases update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* solving rebase issue, repushing changes

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixings according to ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixings according to the ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* notused removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* format issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* formt issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unused files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* remiving unsed files;

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added sentences as test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added senetnces as test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed commentyed out tests

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating dates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* attemps to fix bug

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* inprocess of fixing the bug

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing existing issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated graph_utils, tokenize and classify, and word graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added bacl the ppostprocessor far creation

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated NEMO_NOT_ALPHA as a new variable

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* far files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* combiedn into measure

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing and combined to meaasure

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing, not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to solve the space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving sh issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving sh test issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding anands updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* data updated for measure and whitelist

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing fraction and math part

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing preprocessor, updating measure, adding shitelist cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing processor, modification for sp test, shitelist and word

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating zh date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* realized itn being cvommented out, adding back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* trying to run zh tn separately because it takes long time to run

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* modification to ru zh tn separately

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* independent zh tnitn tests for more time

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding lines to save far file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates for reducing testing time

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* for ounct graph

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing used graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* format and removing used comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing this one, not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* remove unused commentss�

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Delete tools/text_processing_deployment/zh directory

Removing far files.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* updates according to the github comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* punct grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_cases_cardinal.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Dockerfile

Copied from main branch ( which included Anand's updates)

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update launch.sh

Found differences in the file. Fixing it back.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_word.py

Saw word ITN being commented out. Adding it back.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update money.py

Found cardinal grammar not accepting suffix. Fixed it.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update Jenkinsfile

Removed duplicated zh test from line 230s

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update utils.py

Addressing bug raised in bug in graph_utils.py of zh ITN and decimal tagger of ar TN #162.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update graph_utils.py

Addressing bug in graph_utils.py of zh ITN and decimal tagger of ar TN #162.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update measure.py

Fixing code style, removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Fixing code style, removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update measure.py

Removing unused import.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update post_processing.py

Removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update post_processing.py

Removing unused import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update cardinal.py

Deleting unused graph

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Removing import pynini

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

removing pynini import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update verbalize.py

removing pynutil import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update post_processing.py

removing punct graph imported

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_sparrowhawk_normalization.sh

Update on test issue for Docker file locations

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_ordinal.py

Fixing style.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Delete nemo_text_processing/text_normalization/zh/taggers/math_symbol.py

Removing because it's not one of the semiotic classes.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Delete nemo_text_processing/text_normalization/zh/verbalizers/math_symbol.py

Removing because it's not one of the semiotic classes.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Jenkinsfile

Updating Jenkins date

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>
Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Peter Plantinga <plantinga.peter@proton.me>
Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Co-authored-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: Enno Hermann <Eginhard@users.noreply.github.com>
Co-authored-by: Vitaly Lavrukhin <vitaly.lavrukhin@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Enas Albasiri <71229149+ealbasiri@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: lleaver <137942999+lleaver@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: Giacomo Leone Maria Cavallini <72698188+GiacomoLeoneMaria@users.noreply.github.com>
Co-authored-by: Alex Cui <alcui@nvidia.com>
Co-authored-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: Nikolay Karpov <nkarpov@nvidia.com>
Co-authored-by: Peter Plantinga <plantinga.peter@proton.me>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Signed-off-by: Namrata Gachchi <ngachchi@nvidia.com>
ngachchi pushed a commit to ngachchi/NeMo-text-processing that referenced this pull request Jun 23, 2025
* IT TN improvement on tests (#120)

* add missing test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with time tests

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add sentence test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine shortest path for irregular cardinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add single letter exception for roman numerals (#121)

* add single letter exception for roman numerals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci dir

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix broken path for nondet whitelist (#124)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Increase weights for serial (en TN) (#128)

* Increase weights for serial (en TN)

Resolves https://github.com/NVIDIA/NeMo-text-processing/issues/126

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Add tests for fix

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Update Jenkinsfile cache path

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Update Jenkinsfile. Fix cache folder

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measures file for FR TN (#131)

* add measures file

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* update whitelist data

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* add fr tn tests

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Sh jenkins (#127)

* Add SH tests to Jenkins

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkins tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add CI/CD tests for sparrowhawk

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* docker build only if in test mode

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix missing variable

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix comments and remove arguments not required

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix commands not executing

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Missing arguments

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Missing quotes

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix incorrect path for tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Incorrect paths of tests and shunit2

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix issues with paths as arguments to shunit

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Undo path change

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix intentional fail test

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* revert redundant check for cased option

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix default path in export_grammars.sh

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add interactive option

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add SH tests for cased EN ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update isort - fix precommit (#138)

* update isort version

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update isort version

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused imports

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Armenian itn (#136)

* Added Armenian ITN

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added Armenian ITN

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added Armenian ITN

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added context for tests and fixed CodeQL errors

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Revert "Added context for tests and fixed CodeQL errors"

This reverts commit 2c804d941963c0be21d3aad07e6cd13568ab747b.

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added context to some test files and fixed CodeQL errors

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* deleted unnecessary data

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* translated a few measurements to Armenian

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* adjusted some things for better readability and maintainer support

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed one test case and some issues

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>
Co-authored-by: David Sargsyan <d.sargsyan@ispras.ru>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix CI (#142)

* fix whitelist deployment

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* clean up

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comment out tests to recreate grammars

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* shorten test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* cased for TN

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* revert debug changes

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix args default

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* try parallel

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* debug parallel

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* rerun

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* rerun

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix sh tests for local SH launcher

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* enable all ci tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* enable all ci tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Armenian TN (#137)

* merged with main branch and fixed conflicts

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* fixing conflicts

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* fixing some more conflicts

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* fixed a minor issue

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* deleted unused imports

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix: add "hy" language option for armenian

Signed-off-by: Ara Yeroyan <60027241+Ara-Yeroyan@users.noreply.github.com>

* added optional space for measurements after cardinals/decimals

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* added Armenian dot

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>
Signed-off-by: Ara Yeroyan <60027241+Ara-Yeroyan@users.noreply.github.com>
Signed-off-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Co-authored-by: David Sargsyan <d.sargsyan@ispras.ru>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Ara Yeroyan <60027241+Ara-Yeroyan@users.noreply.github.com>
Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Marathi ITN (#134)

* Added Marathi ITN

Signed-off-by: Chinmay Patil <chinmaypatil2000@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adding jenkins test

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

---------

Signed-off-by: Chinmay Patil <chinmaypatil2000@gmail.com>
Signed-off-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Signed-off-by: Travis Bartley <tbartley@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Co-authored-by: Travis Bartley <tbartley@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkins fix (#150)

* jenkins fix

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* removing armenian to troubleshoot jenkins

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* removing armenian to troubleshoot jenkins

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* missing _init_ for python

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* mislabled cache

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

---------

Signed-off-by: Travis Bartley <tbartley@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* r0.3.0 release (#151)

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix text=line[text] to text=line[text_field] (#153)

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* use real string on docstring (#157)

Signed-off-by: Kevin Sanders <kevin.sanders@dialpad.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Sh postprocess (#147)

* Add support for postprocessor far in sparrowhawk

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Choose between having a post processor or not

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update run_evaluate script for cased itn (#164)

* update run_evaluate script for cased itn

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* remove unused function from ar tn decimals (#165)

* remove unused function from ar tn decimals

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* ZH sentence-level TN (#112)

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add country codes from hu (#77)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix electronic case for username (#75)

* fix electronic username w/o .

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix ar test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci dirs, enable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.8 release (#79)

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Codeswitched ES/EN ITN  (#78)

* Initial commit for ES-EN codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable export for es_en codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add whitelist, update weights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests for en_es, zone tagged separately in es

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix path to test data for sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile - enable ES/EN tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add __init__.py files

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix issues with failed docker build - due to archiving of debian and issues with re2

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove unused imports and variables

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update date

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable NBSP in sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update copyrights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache path in for ES/EN CI/CD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* minor normalize.py edit for usability (#84)

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* documentation edits for grammar/clarity

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* added --output_field flag for command line interface

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish ITN (#40)

* force two digits for month

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put it in a function, because I reject the garbage pre-commit.ci came up with

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrap some more pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add graph pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete junk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser (copy from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add right tokens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks, more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to ITN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moved to tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nothing actually fixed here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* now most tests pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* electronic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bare fractions is a bit of an overreach

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just inverting the TN whitelist tagger will not work/be useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from English

Signed-off-by: Jim O'Regan <joregan@kth.se>

* overwrite with version from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swap tsv sides

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add optional_era variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make deterministic default, like most of the others

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also add lowercase versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replacing NEMO_SPACE does not work either

Signed-off-by: Jim O'Regan <joregan@kth.se>

* increasing weight... did not work last time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaking test cases, in case it was a sentence splitting issue. It was not

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put the full stops back

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add filler words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try splitting this out to see if it makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* aha, this part should be non-deterministic only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single line only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "increasing weight... did not work last time"

This reverts commit 39b020b50db745dfd6b281c8cbca45a033926996.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* disabling ITN here makes TN work again(?)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "disabling ITN here makes TN work again(?)"

This reverts commit be49d7d5c687876e51c2e9ce1cf1e01491df280f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing the variable name fixes norm tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing test tooling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a piece for area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add country codes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend any_read_digit for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country/area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* first attempt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove country codes for the time being, makes things ambiguous

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove trailing whitespace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update __init__.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transform of TN tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transformation of TN decimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* slight changes to date

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* include space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen was not that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* respect my own copyright xD

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading to constructor; had weirdness in this file, probably due to module-level python-suckage

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading, this has been an oddity before

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try changing this year declaration

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add year + era

Signed-off-by: Jim O'Regan <joregan@kth.se>

* eliminate more module-level data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "eliminate more module-level data loading"

This reverts commit 6a26e5d927817e1308e818758196924441ff7b3a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expose variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extra param for itn mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix parens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adapt/extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix dict init/change keys to something useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* initial stab at prefixed numbers

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert kl. if absent

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* the relative prefixed times

Signed-off-by: Jim O'Regan <joregan@kth.se>

* + comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* enable time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space in both directions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix hours to

Signed-off-by: Jim O'Regan <joregan@kth.se>

* split by before/after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix if

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. 9

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* keep only get_abs_path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add trimmed file

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* two abs_paths... could be fun

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minutes/seconds

Signed-off-by: Jim O'Regan <joregan@kth.se>

* suffix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* one optional

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. or one of suffix/zone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* already disambiguated

Signed-off-by: Jim O'Regan <joregan@kth.se>

* closure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do not insert kl.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix spelling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Delete measure.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete money.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused test pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add/update __init__

Signed-off-by: Jim O'Regan <joregan@kth.se>

* blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix lang

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove year, conflicts with cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space before, not after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cardinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* spurious deletion

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* re-enable SV TN; enable SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "re-enable SV TN; enable SV ITN"

This reverts commit 3ce4dfde1f70a89afc274284f6e4c737b3fac95b.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix singulras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use cdrewrite

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just EOS/BOS

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* omit en/ett, because they are also articles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* strip spaces from decimal part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partial fix, not what I wanted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* en/ett cannot work in itn case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* be more deliberate in graph construction

Signed-off-by: Jim O'Regan <joregan@kth.se>

* accept both

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* +2 tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* (try to) accept singular quantities for plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* retry

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* oops

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* arcmap

Signed-off-by: Jim O'Regan <joregan@kth.se>

* version without ones

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of this, this is where it goes wrong

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* multiple states from both ones, try removing and readding

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove ones, see if that fixes at least the bare quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, dunno why it still breaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move definition

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* local declaration, seems to not be working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* match verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix last two failing tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing tests for telephone and word

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of convert_space, tests fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "put convert_spaces back, change test file; pytest fails"

This reverts commit a7bb7489137b8026aab02aff64df39e874630043.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails, take 2

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deliberately remove spaces rather than have a non-determinism that comes out differently in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try converting the non-breaking spaces in the shell script

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrong place

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update date.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update time.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix comment

Signed-off-by: Jim O’Regan <joregan@kth.se>

* trim comments

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove commented line

Signed-off-by: Jim O’Regan <joregan@kth.se>

* en halv

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update test_sparrowhawk_inverse_text_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Italian_TN (#67)

* add TN italian

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix init

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix LOCATION

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* modify graph_utils

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* correct decimals

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix measure

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh itn (#74)

* Add ZH ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix copyrights and code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove invalid tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve CodeQL issues

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix missing 'zh' option for ITN and correct comment

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update __init__.py

Change to zh instead of en for the imports.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for decimal test data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* update for langauge import

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for Chinese punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* a new class for whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* PYNINI_AVAILABLE = False

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to file import format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* caught duplicates, removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates, arranges for CHInese Yuan updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates accordingly to the comments from last PR. Recreated some of the files due to format issues

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the hours_to and minute_to files used for back counting. ALso removed am and pm suffix files according to the last PR. Recreated some of them for format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* re-added this file to avoid data file import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated gramamr according to last PR. Removed the acceptance of 千

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated according to last PR. Removed comma after decimal points

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for Fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for money and updated according to last PR. Plus process of 元

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar. updates due to the updates in cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR comments. removing am and pm and allowing simple mandarin expression

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* arrangements

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added whitelist grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* word grammar for non-classified items

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated cardinal, decimal, time, itn data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to last PR

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to the updates for cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for more Mandarin punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR. removing am pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* adjustment on the weight

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the targger updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the time tagger

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to changes in tagger on am and pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* verbalizer for fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for mandarin grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* kept this file because using English utils results in data namin error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* merge conflict

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed import os

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed variables

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue, reccreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue recreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed codeing style/format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed coding style and format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicated graph for 毛

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unnecessary comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* unnecessary comment removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test file updated for more cases

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated with a comment explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated the file explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added Mandarin as zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing for dplication

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed unused NEMO objects

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix test file failures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix file failtures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failture

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adap to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adapt to grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adopt to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixing pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed // for zhtn/itn cache

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Update inverse_normalize.py

Added zh as a selection to pass Jenkins checks.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: Alex Cui <alcui@nvidia.com>
Co-authored-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated pynini_export.py file to create far files (#88)

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* readd Swedish (#87)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn 0712 (#89)

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and fixings according to document on natonal gideline

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fraction updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* money updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* punctuation grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* time gramamr updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* tokenizaer updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates on certificate

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* data updated and added due to updates and chanegs to the existing grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* cardinal updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* date grammar changed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test data added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test python file edits

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for tn1.0 and previous tn grammar from contribution

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test cases updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fixed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* dates updated for init files

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated the date for zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added back the itn tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added back measure and math from previou TN

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated for tests reruns

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated weights

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn char (#95)

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* code stle

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* audio-based TN fix for empty pred_text/text (#92)

* fix for empty pred_text

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add unittests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix pytest

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* pip 1.2.0

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* French tn (#91)

* add tests for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add fr tn for cardinals, decimals, fractions and ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* delete it far files from tools

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add languages to run_evaluate

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* remove ambiguous spacing

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* enable sh testing for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile cache date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix test for ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update tn cache for fr

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* resolve codeql issues

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist_tech.tsv (#96)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zhitn 0727 (#93)

* updates on itn grammar to pass sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats for sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates fro sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fix

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for coding style and sparrowhawk test

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated classes for tests on whitelist and word grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for tests on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_word.py

Removed unused import.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_word.py

Removed imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Removing imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Jenkinsfile

changed zh cache to 07-27-23 as it is the latest update.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es tn romans fix (#98)

* fix es tn roman exceptions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update eval script for ITN

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* codeql fix

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Change docker image (#102)

Change docker image to one including sparrowhawk

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Print warning instead exception (#97)

* raise text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* text arg

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* Failed text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* add logger

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* logger

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* NeMo-text-processing

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* info level

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Normalizer.select_verbalizer

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Exception

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restart ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nikolay Karpov <nkarpov@nvidia.com>
Co-authored-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* warning regardless of verbose flag (#107)

* warning

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* self.verbose

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Unpin setuptools (#106)

Signed-off-by: Peter Plantinga <plantinga.peter@proton.me>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed warnings: File is not always closes. (#113)

Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug #111 (ar currencies) (#117)

* fix bug #111 (ar currencies)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci folder

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Logging clean up + IT TN fix (#118)

* fix utils and it TN

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* clean up

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix logging

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add IT TN to CI

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update patch

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Time_IT_TN (#105)

* add time verbalizer

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add time tagger and verba

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add pytest time

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeQL

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix numbers with eight

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* IT TN improvement on tests (#120)

* add missing test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with time tests

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add sentence test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine shortest path for irregular cardinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add single letter exception for roman numerals (#121)

* add single letter exception for roman numerals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci dir

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrote tokenizer

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed the file and replaced it with char in 1.8

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkins file update

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to fix tn bug@ xuesong

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* tn bug

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixeds and updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adjustments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* testing commit

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating etst cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates adapting to graphs

Signed-off-by: Alex Cui <al…
ngachchi pushed a commit to ngachchi/NeMo-text-processing that referenced this pull request Jun 23, 2025
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal changes will change back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updats on Jenkins

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkinspdate

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding one more test item

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changed regular space to narrow space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports error fixing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports errors

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Jekins update for jp itn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* reverting

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixng style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jp tn date update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* removing previously created nemo imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test order arrangement

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolve fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* empty file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to delete

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add jenkins file (#23)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add // to symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix language

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add usd$

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert "komma"

Signed-off-by: Jim O'Regan <joregan@kth.se>

* "pund" is neuter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* towards proper graphs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* GBP

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make komma non-det

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more money tagger fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix examples in comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* load labels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* right first time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missing space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year in test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a (failing) test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also handle decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed wrapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* telephone tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try adding more brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single digit

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, this seems to work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <joregan@kth.se>

* decimal tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* lower case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* greek letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* |=, not =

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables to check

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try doing this here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix errors in tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <joregan@kth.se>

* merge different tsvs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables for testing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek letters in maths

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor sg/pl

Signed-off-by: Jim O'Regan <joregan@kth.se>

* dedup

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put these under if, too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no; there are no minor neuters, so that is not relevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* here is one error

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export a variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a tesst case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add yen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* final fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove English roman tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove some unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <joregan@kth.se>

* address codeql comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <joregan@kth.se>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove broken duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* time tests now pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add swedish

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix here also

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a date case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplication

Signed-off-by: Jim O'Regan <joregan@kth.se>

* boost n_tagged

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also copyright this year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* days of the week

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci _cr

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* revert setup tool

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* electronic pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove unused imports

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add deterministic option normalized options

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins grammar folder

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up, update for SH

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* reduce cardinal graph

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add weight for sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix stage

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Change cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add whitelist to export

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update docstrings

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix for measures

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

---------

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run language tests in stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update DE cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix telephone, ordinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update electronic

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review feedback, update whitelist

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* rename capitalize func

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix SH tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins folder name

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* added cased arg to ITN

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add input_case arg to other lang

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dirs update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix codeql errors

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder for EN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* extend alignment for itn

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added test to pr doc

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix sv tests (#52)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.7 release (#53)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* superscript to superessive

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal time test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* will want cardinal here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move two letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* other ways of reading w

Signed-off-by: Jim O'Regan <joregan@kth.se>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* currency

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more inflection

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* working now, add a comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also accept the full words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deduplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* duplicate space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cache dir

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add components for read digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example with a different separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* start adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in progress

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nominal digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add IP prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on telephone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt more

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nearly there

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an IP test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* filter ordinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic fraction tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <joregan@kth.se>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test, including spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test for that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swapping order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more swapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some things fixed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more adjustments to time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq

Signed-off-by: Jim O'Regan <joregan@kth.se>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix measure cardinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

* Comment line, for now

Signed-off-by: Jim O’Regan <joregan@kth.se>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* see if this makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <joregan@kth.se>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at least it fails quickly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export original

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move things around for no real reason

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try this again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <joregan@kth.se>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rearrange slightly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853…
FredHaa pushed a commit to FredHaa/NeMo-text-processing that referenced this pull request Aug 15, 2025
* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
FredHaa pushed a commit to FredHaa/NeMo-text-processing that referenced this pull request Aug 15, 2025
* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add country codes from hu (#77)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix electronic case for username (#75)

* fix electronic username w/o .

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix ar test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci dirs, enable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.8 release (#79)

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Codeswitched ES/EN ITN  (#78)

* Initial commit for ES-EN codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable export for es_en codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add whitelist, update weights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests for en_es, zone tagged separately in es

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix path to test data for sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile - enable ES/EN tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add __init__.py files

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix issues with failed docker build - due to archiving of debian and issues with re2

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove unused imports and variables

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update date

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable NBSP in sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update copyrights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache path in for ES/EN CI/CD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* minor normalize.py edit for usability (#84)

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* documentation edits for grammar/clarity

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* added --output_field flag for command line interface

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish ITN (#40)

* force two digits for month

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put it in a function, because I reject the garbage pre-commit.ci came up with

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrap some more pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add graph pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete junk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser (copy from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add right tokens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks, more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to ITN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moved to tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nothing actually fixed here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* now most tests pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* electronic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bare fractions is a bit of an overreach

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just inverting the TN whitelist tagger will not work/be useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from English

Signed-off-by: Jim O'Regan <joregan@kth.se>

* overwrite with version from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swap tsv sides

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add optional_era variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make deterministic default, like most of the others

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also add lowercase versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replacing NEMO_SPACE does not work either

Signed-off-by: Jim O'Regan <joregan@kth.se>

* increasing weight... did not work last time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaking test cases, in case it was a sentence splitting issue. It was not

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put the full stops back

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add filler words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try splitting this out to see if it makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* aha, this part should be non-deterministic only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single line only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "increasing weight... did not work last time"

This reverts commit 39b020b50db745dfd6b281c8cbca45a033926996.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* disabling ITN here makes TN work again(?)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "disabling ITN here makes TN work again(?)"

This reverts commit be49d7d5c687876e51c2e9ce1cf1e01491df280f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing the variable name fixes norm tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing test tooling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a piece for area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add country codes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend any_read_digit for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country/area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* first attempt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove country codes for the time being, makes things ambiguous

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove trailing whitespace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update __init__.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transform of TN tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transformation of TN decimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* slight changes to date

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* include space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen was not that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* respect my own copyright xD

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading to constructor; had weirdness in this file, probably due to module-level python-suckage

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading, this has been an oddity before

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try changing this year declaration

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add year + era

Signed-off-by: Jim O'Regan <joregan@kth.se>

* eliminate more module-level data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "eliminate more module-level data loading"

This reverts commit 6a26e5d927817e1308e818758196924441ff7b3a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expose variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extra param for itn mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix parens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adapt/extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix dict init/change keys to something useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* initial stab at prefixed numbers

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert kl. if absent

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* the relative prefixed times

Signed-off-by: Jim O'Regan <joregan@kth.se>

* + comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* enable time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space in both directions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix hours to

Signed-off-by: Jim O'Regan <joregan@kth.se>

* split by before/after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix if

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. 9

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* keep only get_abs_path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add trimmed file

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* two abs_paths... could be fun

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minutes/seconds

Signed-off-by: Jim O'Regan <joregan@kth.se>

* suffix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* one optional

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. or one of suffix/zone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* already disambiguated

Signed-off-by: Jim O'Regan <joregan@kth.se>

* closure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do not insert kl.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix spelling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Delete measure.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete money.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused test pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add/update __init__

Signed-off-by: Jim O'Regan <joregan@kth.se>

* blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix lang

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove year, conflicts with cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space before, not after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cardinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* spurious deletion

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* re-enable SV TN; enable SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "re-enable SV TN; enable SV ITN"

This reverts commit 3ce4dfde1f70a89afc274284f6e4c737b3fac95b.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix singulras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use cdrewrite

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just EOS/BOS

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* omit en/ett, because they are also articles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* strip spaces from decimal part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partial fix, not what I wanted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* en/ett cannot work in itn case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* be more deliberate in graph construction

Signed-off-by: Jim O'Regan <joregan@kth.se>

* accept both

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* +2 tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* (try to) accept singular quantities for plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* retry

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* oops

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* arcmap

Signed-off-by: Jim O'Regan <joregan@kth.se>

* version without ones

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of this, this is where it goes wrong

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* multiple states from both ones, try removing and readding

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove ones, see if that fixes at least the bare quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, dunno why it still breaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move definition

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* local declaration, seems to not be working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* match verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix last two failing tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing tests for telephone and word

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of convert_space, tests fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "put convert_spaces back, change test file; pytest fails"

This reverts commit a7bb7489137b8026aab02aff64df39e874630043.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails, take 2

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deliberately remove spaces rather than have a non-determinism that comes out differently in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try converting the non-breaking spaces in the shell script

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrong place

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update date.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update time.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix comment

Signed-off-by: Jim O’Regan <joregan@kth.se>

* trim comments

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove commented line

Signed-off-by: Jim O’Regan <joregan@kth.se>

* en halv

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update test_sparrowhawk_inverse_text_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Italian_TN (#67)

* add TN italian

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix init

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix LOCATION

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* modify graph_utils

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* correct decimals

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix measure

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh itn (#74)

* Add ZH ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix copyrights and code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove invalid tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve CodeQL issues

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix missing 'zh' option for ITN and correct comment

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update __init__.py

Change to zh instead of en for the imports.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for decimal test data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* update for langauge import

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for Chinese punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* a new class for whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* PYNINI_AVAILABLE = False

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to file import format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* caught duplicates, removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates, arranges for CHInese Yuan updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates accordingly to the comments from last PR. Recreated some of the files due to format issues

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the hours_to and minute_to files used for back counting. ALso removed am and pm suffix files according to the last PR. Recreated some of them for format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* re-added this file to avoid data file import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated gramamr according to last PR. Removed the acceptance of 千

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated according to last PR. Removed comma after decimal points

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for Fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for money and updated according to last PR. Plus process of 元

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar. updates due to the updates in cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR comments. removing am and pm and allowing simple mandarin expression

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* arrangements

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added whitelist grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* word grammar for non-classified items

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated cardinal, decimal, time, itn data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to last PR

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to the updates for cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for more Mandarin punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR. removing am pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* adjustment on the weight

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the targger updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the time tagger

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to changes in tagger on am and pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* verbalizer for fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for mandarin grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* kept this file because using English utils results in data namin error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* merge conflict

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed import os

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed variables

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue, reccreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue recreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed codeing style/format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed coding style and format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicated graph for 毛

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unnecessary comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* unnecessary comment removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test file updated for more cases

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated with a comment explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated the file explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added Mandarin as zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing for dplication

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed unused NEMO objects

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix test file failures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix file failtures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failture

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adap to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adapt to grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adopt to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixing pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed // for zhtn/itn cache

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Update inverse_normalize.py

Added zh as a selection to pass Jenkins checks.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: Alex Cui <alcui@nvidia.com>
Co-authored-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated pynini_export.py file to create far files (#88)

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* readd Swedish (#87)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn 0712 (#89)

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and fixings according to document on natonal gideline

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fraction updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* money updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* punctuation grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* time gramamr updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* tokenizaer updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates on certificate

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* data updated and added due to updates and chanegs to the existing grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* cardinal updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* date grammar changed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test data added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test python file edits

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for tn1.0 and previous tn grammar from contribution

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test cases updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fixed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* dates updated for init files

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated the date for zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added back the itn tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added back measure and math from previou TN

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated for tests reruns

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated weights

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn char (#95)

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* code stle

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* audio-based TN fix for empty pred_text/text (#92)

* fix for empty pred_text

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add unittests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix pytest

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* pip 1.2.0

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* French tn (#91)

* add tests for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add fr tn for cardinals, decimals, fractions and ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* delete it far files from tools

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add languages to run_evaluate

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* remove ambiguous spacing

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* enable sh testing for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile cache date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix test for ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update tn cache for fr

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* resolve codeql issues

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist_tech.tsv (#96)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zhitn 0727 (#93)

* updates on itn grammar to pass sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats for sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates fro sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fix

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for coding style and sparrowhawk test

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated classes for tests on whitelist and word grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for tests on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_word.py

Removed unused import.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_word.py

Removed imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Removing imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Jenkinsfile

changed zh cache to 07-27-23 as it is the latest update.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es tn romans fix (#98)

* fix es tn roman exceptions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update eval script for ITN

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* codeql fix

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Change docker image (#102)

Change docker image to one including sparrowhawk

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Print warning instead exception (#97)

* raise text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* text arg

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* Failed text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* add logger

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* logger

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* NeMo-text-processing

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* info level

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Normalizer.select_verbalizer

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Exception

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restart ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nikolay Karpov <nkarpov@nvidia.com>
Co-authored-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* warning regardless of verbose flag (#107)

* warning

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* self.verbose

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Unpin setuptools (#106)

Signed-off-by: Peter Plantinga <plantinga.peter@proton.me>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed warnings: File is not always closes. (#113)

Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug #111 (ar currencies) (#117)

* fix bug #111 (ar currencies)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci folder

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Logging clean up + IT TN fix (#118)

* fix utils and it TN

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* clean up

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix logging

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add IT TN to CI

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update patch

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Time_IT_TN (#105)

* add time verbalizer

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add time tagger and verba

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add pytest time

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeQL

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix numbers with eight

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* IT TN improvement on tests (#120)

* add missing test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with time tests

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add sentence test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine shortest path for irregular cardinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add single letter exception for roman numerals (#121)

* add single letter exception for roman numerals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci dir

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrote tokenizer

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed the file and replaced it with char in 1.8

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkins file update

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to fix tn bug@ xuesong

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* tn bug

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixeds and updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adjustments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* testing commit

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating etst cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates adapting to graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated cases for SH tests

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added some sentences

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test cases update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* solving rebase issue, repushing changes

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixings according to ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixings according to the ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* notused removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* format issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* formt issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unused files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* remiving unsed files;

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added sentences as test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added senetnces as test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removed commentyed out tests

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating dates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* attemps to fix bug

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* inprocess of fixing the bug

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing existing issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated graph_utils, tokenize and classify, and word graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* added bacl the ppostprocessor far creation

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated NEMO_NOT_ALPHA as a new variable

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* far files

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* combiedn into measure

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing and combined to meaasure

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing, not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to fix space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates to solve the space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving sh issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving sh test issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding anands updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* data updated for measure and whitelist

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updates

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing fraction and math part

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing preprocessor, updating measure, adding shitelist cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing processor, modification for sp test, shitelist and word

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updating zh date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* realized itn being cvommented out, adding back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* trying to run zh tn separately because it takes long time to run

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* modification to ru zh tn separately

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* independent zh tnitn tests for more time

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding lines to save far file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates for reducing testing time

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* for ounct graph

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing used graphs

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* format and removing used comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing this one, not used

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* remove unused commentss�

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Delete tools/text_processing_deployment/zh directory

Removing far files.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* updates according to the github comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing comments

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* punct grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_cases_cardinal.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Dockerfile

Copied from main branch ( which included Anand's updates)

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update launch.sh

Found differences in the file. Fixing it back.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_word.py

Saw word ITN being commented out. Adding it back.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update money.py

Found cardinal grammar not accepting suffix. Fixed it.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update Jenkinsfile

Removed duplicated zh test from line 230s

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update utils.py

Addressing bug raised in bug in graph_utils.py of zh ITN and decimal tagger of ar TN #162.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update graph_utils.py

Addressing bug in graph_utils.py of zh ITN and decimal tagger of ar TN #162.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update measure.py

Fixing code style, removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Fixing code style, removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update measure.py

Removing unused import.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update post_processing.py

Removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update post_processing.py

Removing unused import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Removing unused imports

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update cardinal.py

Deleting unused graph

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

Removing import pynini

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update word.py

removing pynini import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update verbalize.py

removing pynutil import

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update post_processing.py

removing punct graph imported

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_sparrowhawk_normalization.sh

Update on test issue for Docker file locations

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_ordinal.py

Fixing style. 

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Delete nemo_text_processing/text_normalization/zh/taggers/math_symbol.py

Removing because it's not one of the semiotic classes.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Delete nemo_text_processing/text_normalization/zh/verbalizers/math_symbol.py

Removing because it's not one of the semiotic classes.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Jenkinsfile

Updating Jenkins date

Sign…
FredHaa pushed a commit to FredHaa/NeMo-text-processing that referenced this pull request Aug 15, 2025
* IT TN improvement on tests (#120)

* add missing test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with time tests

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add sentence test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine shortest path for irregular cardinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add single letter exception for roman numerals (#121)

* add single letter exception for roman numerals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci dir

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix broken path for nondet whitelist (#124)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Increase weights for serial (en TN) (#128)

* Increase weights for serial (en TN)

Resolves https://github.com/NVIDIA/NeMo-text-processing/issues/126

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Add tests for fix

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Update Jenkinsfile cache path

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Update Jenkinsfile. Fix cache folder

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measures file for FR TN (#131)

* add measures file

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* update whitelist data

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* add fr tn tests

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Sh jenkins (#127)

* Add SH tests to Jenkins

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkins tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add CI/CD tests for sparrowhawk

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* docker build only if in test mode

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix missing variable

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix comments and remove arguments not required

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix commands not executing

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Missing arguments

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Missing quotes

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix incorrect path for tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Incorrect paths of tests and shunit2

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix issues with paths as arguments to shunit

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Undo path change

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix intentional fail test

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* revert redundant check for cased option

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix default path in export_grammars.sh

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add interactive option

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add SH tests for cased EN ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update isort - fix precommit (#138)

* update isort version

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update isort version

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused imports

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Armenian itn (#136)

* Added Armenian ITN

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added Armenian ITN

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added Armenian ITN

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added context for tests and fixed CodeQL errors

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Revert "Added context for tests and fixed CodeQL errors"

This reverts commit 2c804d941963c0be21d3aad07e6cd13568ab747b.

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* Added context to some test files and fixed CodeQL errors

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* deleted unnecessary data

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* translated a few measurements to Armenian

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* adjusted some things for better readability and maintainer support

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed one test case and some issues

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>
Co-authored-by: David Sargsyan <d.sargsyan@ispras.ru>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix CI (#142)

* fix whitelist deployment

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* clean up

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comment out tests to recreate grammars

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* shorten test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* cased for TN

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* revert debug changes

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix args default

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* try parallel

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* debug parallel

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* rerun

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* rerun

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix sh tests for local SH launcher

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* enable all ci tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* enable all ci tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Armenian TN (#137)

* merged with main branch and fixed conflicts

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* fixing conflicts

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* fixing some more conflicts

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* fixed a minor issue

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* deleted unused imports

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix: add "hy" language option for armenian

Signed-off-by: Ara Yeroyan <60027241+Ara-Yeroyan@users.noreply.github.com>

* added optional space for measurements after cardinals/decimals

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* added Armenian dot

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru>
Signed-off-by: Ara Yeroyan <60027241+Ara-Yeroyan@users.noreply.github.com>
Signed-off-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Co-authored-by: David Sargsyan <d.sargsyan@ispras.ru>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Ara Yeroyan <60027241+Ara-Yeroyan@users.noreply.github.com>
Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Marathi ITN (#134)

* Added Marathi ITN

Signed-off-by: Chinmay Patil <chinmaypatil2000@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adding jenkins test

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

---------

Signed-off-by: Chinmay Patil <chinmaypatil2000@gmail.com>
Signed-off-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Signed-off-by: Travis Bartley <tbartley@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Co-authored-by: Travis Bartley <tbartley@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkins fix (#150)

* jenkins fix

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* removing armenian to troubleshoot jenkins

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* removing armenian to troubleshoot jenkins

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* missing _init_ for python

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* mislabled cache

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

---------

Signed-off-by: Travis Bartley <tbartley@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* r0.3.0 release (#151)

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix text=line[text] to text=line[text_field] (#153)

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* use real string on docstring (#157)

Signed-off-by: Kevin Sanders <kevin.sanders@dialpad.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Sh postprocess (#147)

* Add support for postprocessor far in sparrowhawk

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Choose between having a post processor or not

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update run_evaluate script for cased itn (#164)

* update run_evaluate script for cased itn

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* remove unused function from ar tn decimals (#165)

* remove unused function from ar tn decimals

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* update ci date

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* ZH sentence-level TN (#112)

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add country codes from hu (#77)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix electronic case for username (#75)

* fix electronic username w/o .

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix ar test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci dirs, enable sv tests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.8 release (#79)

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Codeswitched ES/EN ITN  (#78)

* Initial commit for ES-EN codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable export for es_en codeswitched ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add whitelist, update weights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests for en_es, zone tagged separately in es

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix path to test data for sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile - enable ES/EN tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add __init__.py files

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix issues with failed docker build - due to archiving of debian and issues with re2

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove unused imports and variables

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update date

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Enable NBSP in sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update copyrights

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache path in for ES/EN CI/CD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* minor normalize.py edit for usability (#84)

* electronic verbalizer fallback (#81)

* 0.1.8 release

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add elec fallback

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* documentation edits for grammar/clarity

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* added --output_field flag for command line interface

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish ITN (#40)

* force two digits for month

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put it in a function, because I reject the garbage pre-commit.ci came up with

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrap some more pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add graph pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete junk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser (copy from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add date verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add right tokens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks, more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to ITN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaks to TN date tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moved to tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nothing actually fixed here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* now most tests pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* electronic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bare fractions is a bit of an overreach

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just inverting the TN whitelist tagger will not work/be useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from English

Signed-off-by: Jim O'Regan <joregan@kth.se>

* overwrite with version from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swap tsv sides

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add optional_era variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make deterministic default, like most of the others

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also add lowercase versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replacing NEMO_SPACE does not work either

Signed-off-by: Jim O'Regan <joregan@kth.se>

* increasing weight... did not work last time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweaking test cases, in case it was a sentence splitting issue. It was not

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put the full stops back

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add filler words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try splitting this out to see if it makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* aha, this part should be non-deterministic only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single line only

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "increasing weight... did not work last time"

This reverts commit 39b020b50db745dfd6b281c8cbca45a033926996.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* disabling ITN here makes TN work again(?)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "disabling ITN here makes TN work again(?)"

This reverts commit be49d7d5c687876e51c2e9ce1cf1e01491df280f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing the variable name fixes norm tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing test tooling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy telephone fixes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a piece for area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add country codes from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend any_read_digit for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country/area codes for ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* first attempt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add to t&c

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove country codes for the time being, makes things ambiguous

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove trailing whitespace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update __init__.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transform of TN tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic transformation of TN decimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* slight changes to date

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* include space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* problem with tusen was not that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions from hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* respect my own copyright xD

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading to constructor; had weirdness in this file, probably due to module-level python-suckage

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading, this has been an oddity before

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try changing this year declaration

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add year + era

Signed-off-by: Jim O'Regan <joregan@kth.se>

* eliminate more module-level data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "eliminate more module-level data loading"

This reverts commit 6a26e5d927817e1308e818758196924441ff7b3a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expose variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extra param for itn mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change call

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix parens

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move data loading

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adapt/extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix dict init/change keys to something useful

Signed-off-by: Jim O'Regan <joregan@kth.se>

* initial stab at prefixed numbers

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert kl. if absent

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* the relative prefixed times

Signed-off-by: Jim O'Regan <joregan@kth.se>

* + comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* enable time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space in both directions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix hours to

Signed-off-by: Jim O'Regan <joregan@kth.se>

* split by before/after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix if

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. 9

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from en

Signed-off-by: Jim O'Regan <joregan@kth.se>

* keep only get_abs_path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add trimmed file

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* two abs_paths... could be fun

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minutes/seconds

Signed-off-by: Jim O'Regan <joregan@kth.se>

* suffix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete, not insert

Signed-off-by: Jim O'Regan <joregan@kth.se>

* one optional

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* kl. or one of suffix/zone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* already disambiguated

Signed-off-by: Jim O'Regan <joregan@kth.se>

* closure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do not insert kl.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix spelling

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Delete measure.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete money.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused test pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add/update __init__

Signed-off-by: Jim O'Regan <joregan@kth.se>

* blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix lang

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove year, conflicts with cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* space before, not after

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cardinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* spurious deletion

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* re-enable SV TN; enable SV ITN

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "re-enable SV TN; enable SV ITN"

This reverts commit 3ce4dfde1f70a89afc274284f6e4c737b3fac95b.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix singulras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use cdrewrite

Signed-off-by: Jim O'Regan <joregan@kth.se>

* just EOS/BOS

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* omit en/ett, because they are also articles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* strip spaces from decimal part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partial fix, not what I wanted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* en/ett cannot work in itn case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* be more deliberate in graph construction

Signed-off-by: Jim O'Regan <joregan@kth.se>

* accept both

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* +2 tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* (try to) accept singular quantities for plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* retry

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* oops

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace

Signed-off-by: Jim O'Regan <joregan@kth.se>

* arcmap

Signed-off-by: Jim O'Regan <joregan@kth.se>

* version without ones

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of this, this is where it goes wrong

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* multiple states from both ones, try removing and readding

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove ones, see if that fixes at least the bare quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, dunno why it still breaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move definition

Signed-off-by: Jim O'Regan <joregan@kth.se>

* simplify

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tweak

Signed-off-by: Jim O'Regan <joregan@kth.se>

* another test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* local declaration, seems to not be working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* match verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix last two failing tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing tests for telephone and word

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of convert_space, tests fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "put convert_spaces back, change test file; pytest fails"

This reverts commit a7bb7489137b8026aab02aff64df39e874630043.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put convert_spaces back, change test file; pytest fails, take 2

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deliberately remove spaces rather than have a non-determinism that comes out differently in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try converting the non-breaking spaces in the shell script

Signed-off-by: Jim O'Regan <joregan@kth.se>

* wrong place

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update date.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update time.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix comment

Signed-off-by: Jim O’Regan <joregan@kth.se>

* trim comments

Signed-off-by: Jim O’Regan <joregan@kth.se>

* remove commented line

Signed-off-by: Jim O’Regan <joregan@kth.se>

* en halv

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Update test_sparrowhawk_inverse_text_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Italian_TN (#67)

* add TN italian

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix init

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix LOCATION

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* modify graph_utils

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* correct decimals

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix electronic

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

* fix measure

Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh itn (#74)

* Add ZH ITN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix copyrights and code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Remove invalid tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve CodeQL issues

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix missing 'zh' option for ITN and correct comment

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update __init__.py

Change to zh instead of en for the imports.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for decimal test data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* update for langauge import

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for Chinese punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* a new class for whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* PYNINI_AVAILABLE = False

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to file import format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* recreated due to format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* caught duplicates, removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates, arranges for CHInese Yuan updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates accordingly to the comments from last PR. Recreated some of the files due to format issues

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the hours_to and minute_to files used for back counting. ALso removed am and pm suffix files according to the last PR. Recreated some of them for format issue

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* re-added this file to avoid data file import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated gramamr according to last PR. Removed the acceptance of 千

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated according to last PR. Removed comma after decimal points

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for Fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* gramamr for money and updated according to last PR. Plus process of 元

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar. updates due to the updates in cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR comments. removing am and pm and allowing simple mandarin expression

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* arrangements

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added whitelist grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* word grammar for non-classified items

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated cardinal, decimal, time, itn data

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to last PR

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to the updates for cardinal grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for more Mandarin punctuations

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to last PR. removing am pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* adjustment on the weight

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the targger updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated accordingly to the time tagger

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates according to changes in tagger on am and pm

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* verbalizer for fraction

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for mandarin grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* kept this file because using English utils results in data namin error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* merge conflict

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed import os

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* deleted unsed variables

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and edits based on pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue, reccreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* format issue recreated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed codeing style/format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed coding style and format

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicated graph for 毛

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed the comment

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unnecessary comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* unnecessary comment removed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test file updated for more cases

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated with a comment explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated the file explaining why this file is kept

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added Mandarin as zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing for dplication

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed unused NEMO objects

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed duplicates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix test file failures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to fix file failtures

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failture

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to resolve test case failure

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adap to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adapt to grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates to adopt to cardinal grammar changes

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fix style

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixing pr checks

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed // for zhtn/itn cache

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Update inverse_normalize.py

Added zh as a selection to pass Jenkins checks.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: Alex Cui <alcui@nvidia.com>
Co-authored-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updated pynini_export.py file to create far files (#88)

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* readd Swedish (#87)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn 0712 (#89)

* updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates and fixings according to document on natonal gideline

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* Decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fraction updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* money updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* ordinal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* punctuation grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* time gramamr updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* tokenizaer updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates on certificate

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* data updated and added due to updates and chanegs to the existing grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* cardinal updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* date grammar changed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* decimal grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* grammar updates

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test data added

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test python file edits

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for tn1.0 and previous tn grammar from contribution

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* test cases updated

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fixed

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* dates updated for init files

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated the date for zh

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed unsed imports

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* removed comments

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added back the itn tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added back measure and math from previou TN

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated for tests reruns

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated weights

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zh tn char (#95)

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name change

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* file name

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* code stle

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* fixed import error

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* audio-based TN fix for empty pred_text/text (#92)

* fix for empty pred_text

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add unittests

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix path

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix pytest

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* pip 1.2.0

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* French tn (#91)

* add tests for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add fr tn for cardinals, decimals, fractions and ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* delete it far files from tools

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add languages to run_evaluate

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* remove ambiguous spacing

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* enable sh testing for fr tn

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix bug with ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile cache date

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix test for ordinals

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update tn cache for fr

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* resolve codeql issues

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist_tech.tsv (#96)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Zhitn 0727 (#93)

* updates on itn grammar to pass sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updats for sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates fro sparrowhawk tests

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* coding style fix

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updates for coding style and sparrowhawk test

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* updated classes for tests on whitelist and word grammar

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for tests on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added for test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on whitelist

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* added to run test on word

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_word.py

Removed unused import.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_word.py

Removed imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Removing imports according to CodeQL

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update test_whitelist.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* Update Jenkinsfile

changed zh cache to 07-27-23 as it is the latest update.

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

---------

Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es tn romans fix (#98)

* fix es tn roman exceptions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update jenkinsfile

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update eval script for ITN

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* codeql fix

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Change docker image (#102)

Change docker image to one including sparrowhawk

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Print warning instead exception (#97)

* raise text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* text arg

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* Failed text

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* add logger

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* logger

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* NeMo-text-processing

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* info level

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm raise

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Normalizer.select_verbalizer

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Exception

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* verbose

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restart ci

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nikolay Karpov <nkarpov@nvidia.com>
Co-authored-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* warning regardless of verbose flag (#107)

* warning

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* self.verbose

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Unpin setuptools (#106)

Signed-off-by: Peter Plantinga <plantinga.peter@proton.me>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed warnings: File is not always closes. (#113)

Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug #111 (ar currencies) (#117)

* fix bug #111 (ar currencies)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* update ci folder

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Logging clean up + IT TN fix (#118)

* fix utils and it TN

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* clean up

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix logging

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix format

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* add IT TN to CI

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update patch

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Time_IT_TN (#105)

* add time verbalizer

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add time tagger and verba

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add pytest time

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeQL

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix numbers …
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants