Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
165 commits
Select commit Hold shift + click to select a range
a22da43
Swedish telephone fix (#60)
jimregan May 11, 2023
45f2e58
log instead of print in graph_utils.py (#68)
eginhard May 17, 2023
a7dd550
CER estimation speedup for audio-based text normalization (#73)
vsl9 May 27, 2023
de622b3
add measure coverage for TN and ITN (#62)
ealbasiri Jun 6, 2023
34e761e
upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)
mgrafu Jun 6, 2023
2eb1cd2
add country codes from hu (#77)
jimregan Jun 8, 2023
5f7de06
fix electronic case for username (#75)
ekmb Jun 8, 2023
009f723
0.1.8 release (#79)
ekmb Jun 13, 2023
143ff75
Codeswitched ES/EN ITN (#78)
anand-nv Jun 14, 2023
0263378
electronic verbalizer fallback (#81)
ekmb Jun 20, 2023
1168dc7
minor normalize.py edit for usability (#84)
lleaver Jun 28, 2023
54a9fd4
Swedish ITN (#40)
jimregan Jun 29, 2023
0df10a2
Italian_TN (#67)
GiacomoLeoneMaria Jun 29, 2023
2fd5270
Zh itn (#74)
BuyuanCui Jun 30, 2023
1312367
updated pynini_export.py file to create far files (#88)
BuyuanCui Jul 6, 2023
68f482f
readd Swedish (#87)
jimregan Jul 17, 2023
a9aa462
Zh tn 0712 (#89)
BuyuanCui Aug 7, 2023
f5fce61
Zh tn char (#95)
BuyuanCui Aug 8, 2023
9e994d1
audio-based TN fix for empty pred_text/text (#92)
ekmb Aug 15, 2023
fdad64e
pip 1.2.0
ekmb Aug 15, 2023
9bd65c8
French tn (#91)
mgrafu Aug 25, 2023
7678c51
Add whitelist_tech.tsv (#96)
anand-nv Aug 29, 2023
b5ce536
Zhitn 0727 (#93)
BuyuanCui Sep 4, 2023
6fa8cc0
Es tn romans fix (#98)
mgrafu Sep 6, 2023
b5b18b4
Change docker image (#102)
anand-nv Sep 7, 2023
4473d6f
Print warning instead exception (#97)
karpnv Sep 27, 2023
2dd40ff
warning regardless of verbose flag (#107)
karpnv Oct 3, 2023
42aa7d3
Unpin setuptools (#106)
pplantinga Oct 4, 2023
9d2b2e3
fixed warnings: File is not always closes. (#113)
XuesongYang Oct 10, 2023
a866742
fix bug #111 (ar currencies) (#117)
mgrafu Oct 23, 2023
739e4a2
Logging clean up + IT TN fix (#118)
ekmb Oct 24, 2023
a737374
Time_IT_TN (#105)
GiacomoLeoneMaria Oct 25, 2023
1b9800f
IT TN improvement on tests (#120)
mgrafu Oct 26, 2023
304ed7c
add single letter exception for roman numerals (#121)
mgrafu Oct 27, 2023
ae0e0bc
rewrote tokenizer
BuyuanCui Oct 5, 2023
cd9d786
removed the file and replaced it with char in 1.8
BuyuanCui Oct 5, 2023
071aad3
jenkins file update
BuyuanCui Oct 6, 2023
97b71c2
to fix tn bug@ xuesong
BuyuanCui Oct 19, 2023
d61c913
tn bug
BuyuanCui Oct 19, 2023
04e440b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 19, 2023
5951c5a
fixeds and updates
BuyuanCui Oct 19, 2023
27c3887
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 19, 2023
45707d0
adjustments
BuyuanCui Oct 30, 2023
4d41adb
testing commit
BuyuanCui Nov 1, 2023
a5de011
removing unsed file
BuyuanCui Nov 1, 2023
633acf8
updated test cases
BuyuanCui Nov 3, 2023
0026e92
updating etst cases
BuyuanCui Nov 4, 2023
25a206c
updates adapting to graphs
BuyuanCui Nov 7, 2023
a759797
updated cases for SH tests
BuyuanCui Nov 8, 2023
c1c926b
updated cases
BuyuanCui Nov 13, 2023
01f54e6
added some sentences
BuyuanCui Dec 1, 2023
cc9e5bf
test cases update
BuyuanCui Dec 12, 2023
fca36eb
solving rebase issue, repushing changes
BuyuanCui Dec 12, 2023
0be4e23
resolving conflict
BuyuanCui Dec 12, 2023
0f914cd
Merge branch 'main' into zh_tn_oct5_update
BuyuanCui Dec 12, 2023
78b51a0
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 12, 2023
41174da
fixings according to ci
BuyuanCui Dec 12, 2023
8323fa4
fixings according to the ci
BuyuanCui Dec 12, 2023
f98f9ad
Merge branch 'zh_tn_oct5_update' of https://github.com/NVIDIA/NeMo-te…
BuyuanCui Dec 12, 2023
f5903f4
removed not used
BuyuanCui Dec 12, 2023
e8623c9
notused removing
BuyuanCui Dec 12, 2023
eb7971d
format issue
BuyuanCui Dec 12, 2023
5b091a3
formt issue
BuyuanCui Dec 12, 2023
193ffe8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 12, 2023
b1db6c5
removing unused files
BuyuanCui Dec 15, 2023
e136933
removing unused files
BuyuanCui Dec 15, 2023
012b8bc
remiving unsed files;
BuyuanCui Dec 15, 2023
c46b3c2
removing unsed files
BuyuanCui Dec 15, 2023
06e41df
removing unsed files
BuyuanCui Dec 15, 2023
cb8159a
Merge branch 'zh_tn_oct5_update' of https://github.com/NVIDIA/NeMo-te…
BuyuanCui Dec 15, 2023
4cb65a7
added sentences as test cases
BuyuanCui Dec 15, 2023
1a126bf
added senetnces as test cases
BuyuanCui Dec 15, 2023
53bf81f
removed commentyed out tests
BuyuanCui Dec 19, 2023
b0f0474
updating dates
BuyuanCui Dec 19, 2023
8910562
attemps to fix bug
BuyuanCui Dec 26, 2023
431da95
inprocess of fixing the bug
BuyuanCui Jan 3, 2024
3f352f6
fixing existing issue
BuyuanCui Jan 24, 2024
e58d6f8
updated graph_utils, tokenize and classify, and word graphs
BuyuanCui Feb 7, 2024
1ed9f80
added bacl the ppostprocessor far creation
BuyuanCui Feb 7, 2024
58c0c35
updated NEMO_NOT_ALPHA as a new variable
BuyuanCui Feb 15, 2024
7d8aaca
far files
BuyuanCui Feb 15, 2024
ad94fe5
Merge branch 'main' into zh_tn_oct5_update
BuyuanCui Feb 15, 2024
8bedcd6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 15, 2024
0ef73b9
combiedn into measure
BuyuanCui Feb 15, 2024
1db8c96
removing and combined to meaasure
BuyuanCui Feb 15, 2024
0a69d22
removing, not used
BuyuanCui Feb 15, 2024
cad806c
mergeing for the existring tn update
BuyuanCui Feb 15, 2024
fb08fc4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 15, 2024
2f9473a
updates to fix space issue
BuyuanCui Mar 13, 2024
8edb96d
updates to fix space issue
BuyuanCui Mar 13, 2024
36494cf
updates to fix space issue
BuyuanCui Mar 13, 2024
cb9a6c1
updates to solve the space issue
BuyuanCui Mar 13, 2024
fce448f
resolving sh issue
BuyuanCui Mar 20, 2024
6ece2bd
resolving sh test issue
BuyuanCui Mar 20, 2024
6efc8d5
adding anands updates
BuyuanCui Mar 20, 2024
ef3bd23
Merge branch 'zh_tn_oct5_update' of https://github.com/NVIDIA/NeMo-te…
BuyuanCui Mar 20, 2024
fa7668f
data updated for measure and whitelist
BuyuanCui Apr 1, 2024
94352a9
updates
BuyuanCui Apr 1, 2024
43d608f
updates
BuyuanCui Apr 1, 2024
d2d9076
updates
BuyuanCui Apr 1, 2024
f8f2ec3
removing fraction and math part
BuyuanCui Apr 1, 2024
9e2b288
removing comments
BuyuanCui Apr 1, 2024
2506b0b
removing preprocessor, updating measure, adding shitelist cases
BuyuanCui Apr 1, 2024
eaf8be3
removing processor, modification for sp test, shitelist and word
BuyuanCui Apr 1, 2024
d980c0b
updating zh date
BuyuanCui Apr 2, 2024
b726aa2
Merge branch 'main' into zh_tn_oct5_update
BuyuanCui Apr 2, 2024
8b0fa4a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 2, 2024
fb6d7b8
realized itn being cvommented out, adding back
BuyuanCui Apr 2, 2024
d558778
Merge branch 'zh_tn_oct5_update' of https://github.com/NVIDIA/NeMo-te…
BuyuanCui Apr 2, 2024
215be06
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 2, 2024
1931fda
trying to run zh tn separately because it takes long time to run
BuyuanCui Apr 3, 2024
b28dbd4
Merge branch 'zh_tn_oct5_update' of https://github.com/NVIDIA/NeMo-te…
BuyuanCui Apr 3, 2024
3a8faf4
modification to ru zh tn separately
BuyuanCui Apr 3, 2024
43ac3de
independent zh tnitn tests for more time
BuyuanCui Apr 3, 2024
80a43bc
adding lines to save far file
BuyuanCui Apr 3, 2024
376d34c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 3, 2024
fe930e4
updates for reducing testing time
BuyuanCui Apr 8, 2024
6bafca4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 8, 2024
9254658
for ounct graph
BuyuanCui Apr 8, 2024
e15c5c9
Merge branch 'zh_tn_oct5_update' of https://github.com/NVIDIA/NeMo-te…
BuyuanCui Apr 8, 2024
f8efd81
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 8, 2024
653d365
removing used graphs
BuyuanCui Apr 8, 2024
30564fa
format and removing used comments
BuyuanCui Apr 8, 2024
29ee1d8
removing this one, not used
BuyuanCui Apr 8, 2024
eb87a8e
remove unused commentss
BuyuanCui Apr 8, 2024
6245414
removing unsed comments
BuyuanCui Apr 8, 2024
d86e123
removing unsed comments
BuyuanCui Apr 8, 2024
4e65a12
removing comments
BuyuanCui Apr 8, 2024
4667e9c
Delete tools/text_processing_deployment/zh directory
BuyuanCui Apr 8, 2024
81710a6
updates according to the github comments
BuyuanCui Apr 8, 2024
3da4b24
Merge branch 'zh_tn_oct5_update' of https://github.com/NVIDIA/NeMo-te…
BuyuanCui Apr 8, 2024
1b4c52d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 8, 2024
fd573fb
removing comments
BuyuanCui Apr 11, 2024
4fa1d76
punct grammar
BuyuanCui Apr 11, 2024
1c15111
Merge branch 'zh_tn_oct5_update' of https://github.com/NVIDIA/NeMo-te…
BuyuanCui Apr 11, 2024
d788b65
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 11, 2024
46064b2
Update test_cases_cardinal.txt
BuyuanCui Apr 16, 2024
4e7e9cd
Update Dockerfile
BuyuanCui Apr 16, 2024
e4a3e8e
Update launch.sh
BuyuanCui Apr 16, 2024
cde7782
Update test_word.py
BuyuanCui Apr 16, 2024
54f0223
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 16, 2024
ea3f78a
Update money.py
BuyuanCui Apr 18, 2024
b99634b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 18, 2024
94ee1f5
Update Jenkinsfile
BuyuanCui Apr 18, 2024
2841319
Update utils.py
BuyuanCui Apr 18, 2024
8709751
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 18, 2024
b1faa7d
Update graph_utils.py
BuyuanCui Apr 18, 2024
a1129b7
Update measure.py
BuyuanCui Apr 18, 2024
396b212
Update word.py
BuyuanCui Apr 18, 2024
e3c2adb
Update measure.py
BuyuanCui Apr 18, 2024
5563b63
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 18, 2024
6340746
Update post_processing.py
BuyuanCui Apr 18, 2024
f32af56
Update post_processing.py
BuyuanCui Apr 19, 2024
2142cd3
Update word.py
BuyuanCui Apr 24, 2024
85c99cb
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 24, 2024
329cb13
Update cardinal.py
BuyuanCui Apr 24, 2024
699b5bb
Update word.py
BuyuanCui Apr 25, 2024
4d52a18
Update word.py
BuyuanCui Apr 25, 2024
1403b3a
Update verbalize.py
BuyuanCui Apr 25, 2024
35b556f
Update post_processing.py
BuyuanCui Apr 25, 2024
011a0ff
Update test_sparrowhawk_normalization.sh
BuyuanCui Apr 26, 2024
bdfcb46
Update test_ordinal.py
BuyuanCui Apr 26, 2024
38d99a4
Delete nemo_text_processing/text_normalization/zh/taggers/math_symbol.py
BuyuanCui Apr 29, 2024
99ee464
Delete nemo_text_processing/text_normalization/zh/verbalizers/math_sy…
BuyuanCui Apr 29, 2024
cc9d49e
Update Jenkinsfile
BuyuanCui Apr 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 24 additions & 12 deletions Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ pipeline {
RU_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/06-08-23-0'
VI_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/06-08-23-0'
SV_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/06-08-23-0'
ZH_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/07-27-23-0'
ZH_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/04-30-24-0'
IT_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/10-26-23-0'
HY_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/03-12-24-0'
MR_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/03-12-24-1'
Expand Down Expand Up @@ -189,7 +189,7 @@ pipeline {
}
}

stage('L0: Create RU TN/ITN Grammars & SV & PT & ZH') {
stage('L0: Create RU TN/ITN Grammars & SV & PT') {
when {
anyOf {
branch 'main'
Expand Down Expand Up @@ -228,16 +228,6 @@ pipeline {
sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/inverse_text_normalization/inverse_normalize.py --lang=pt --text="dez " --cache_dir ${PT_TN_CACHE}'
}
}
stage('L0: ZH TN grammars') {
steps {
sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/text_normalization/normalize.py --lang=zh --text="你" --cache_dir ${ZH_TN_CACHE}'
}
}
stage('L0: ZH ITN grammars') {
steps {
sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/inverse_text_normalization/inverse_normalize.py --lang=zh --text="二零零二年一月二十八日 " --cache_dir ${ZH_TN_CACHE}'
}
}
}
}

Expand Down Expand Up @@ -267,9 +257,31 @@ pipeline {
}
}
}
stage('L0: Create ZH TN/ITN Grammar') {
when {
anyOf {
branch 'main'
changeRequest target: 'main'
}
}
failFast true
parallel {
stage('L0: ZH ITN grammars') {
steps {
sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/inverse_text_normalization/inverse_normalize.py --lang=zh --text="你" --cache_dir ${ZH_TN_CACHE}'
}
}
stage('L0: ZH TN grammars') {
steps {
sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/text_normalization/normalize.py --lang=zh --text="6" --cache_dir ${ZH_TN_CACHE}'
}
}
}
}


// L1 Tests starts here

stage('L1: TN/ITN Tests CPU') {
when {
anyOf {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@
from pynini.export import export
from pynini.lib import byte, pynutil, utf8

from nemo_text_processing.inverse_text_normalization.zh.utils import load_labels

NEMO_CHAR = utf8.VALID_UTF8_CHAR
NEMO_DIGIT = byte.DIGIT
NEMO_HEX = pynini.union(*string.hexdigits).optimize()
Expand Down
14 changes: 14 additions & 0 deletions nemo_text_processing/inverse_text_normalization/zh/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,3 +60,17 @@ def get_various_formats(text: str) -> List[str]:
result.append(t.upper())
result.append(t.capitalize())
return result


def load_labels(abs_path):
"""
loads relative path file as dictionary

Args:
abs_path: absolute path

Returns dictionary of mappings
"""
with open(abs_path, encoding="utf-8") as label_tsv:
labels = list(csv.reader(label_tsv, delimiter="\t"))
return labels
Original file line number Diff line number Diff line change
Expand Up @@ -70,3 +70,5 @@
<
>
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@
× 乘
÷ 除
° 度
- 减
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
amu 原子质量
bar 巴
° 度
º 度
°c 摄氏度
°C 摄氏度
ºc 摄氏度
Expand Down Expand Up @@ -40,23 +38,6 @@ kw 千瓦
kW 千瓦
lb 磅
lbs 磅
m2 平方米
m² 平方米
m3 立方米
m³ 立方米
mbps 兆比特每秒
mg 毫克
mhz 兆赫兹
mi2 平方英里
mi² 平方英里
mi 英里
min 分钟哦
ml 毫升
mm2 平方毫米
mm² 平方毫米
mol 摩尔
mpa 兆帕
mph 英里每小时
ng 纳克
nm 纳米
ns 纳秒
Expand All @@ -80,13 +61,7 @@ gb 吉字节
gpa 吉帕斯卡
gy 戈瑞
ha 公顷
m 米
mm 毫米
ms 毫秒
mv 毫伏
mw 毫瓦
pg 皮克
ps 皮秒
s 秒
ms 毫秒
g 克
211 changes: 0 additions & 211 deletions nemo_text_processing/text_normalization/zh/data/measure/units_zh.tsv

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,6 @@ Ft 匈牙利福林
₪ 以色列谢克尔
J$ 牙买加元
лв 哈萨克斯坦腾格
₩ 朝鲜园
лв 吉尔吉斯斯坦索姆
₭ 老挝基普
ден 马其顿代纳尔
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
1 一
2 两
3 三
4 四
5 五
6 六
7 七
8 八
9 九
Loading