-
Notifications
You must be signed in to change notification settings - Fork 145
Description
Rule conflicting between MoneyFst and SerialFst tagger
Steps/Code to reproduce bug
Command:
python nemo_text_processing/text_normalization/normalize.py --verbose --text 'Thank you for the quantities. Now, lets talk about the pricing. The price for each canned salmon is $5, each bottle of peanut butter is $3'
Output:
Thank you for the quantities. Now, lets talk about the pricing. The price for each canned salmon is five dollars, each bottle of peanut butter is dollar three
Expected behavior
Expected output:
Thank you for the quantities. Now, lets talk about the pricing. The price for each canned salmon is five dollars, each bottle of peanut butter is three dollar
Environment overview
- Environment location: Bare-metal
- Method of NeMo install: pip install
Environment details
- OS version: Fedora 38
- PyTorch version: 2.0.0
- Python version: 3.10.10
Additional information
I found that there is a conflict between MoneyFst and SerialFst taggers.
Both tagger returns the same weight==2404.29785
Computed using pynini.shortestdistance(tagged_lattice, delta=10**-8)[-1]})
| tagged_lattice = self.find_tags(text) |
Due to the this code:
NeMo-text-processing/nemo_text_processing/text_normalization/en/taggers/tokenize_and_classify.py
Lines 163 to 176 in 5dd753a
| classify = ( | |
| pynutil.add_weight(whitelist_graph, 1.01) | |
| | pynutil.add_weight(time_graph, 1.1) | |
| | pynutil.add_weight(date_graph, 1.09) | |
| | pynutil.add_weight(decimal_graph, 1.1) | |
| | pynutil.add_weight(measure_graph, 1.1) | |
| | pynutil.add_weight(cardinal_graph, 1.1) | |
| | pynutil.add_weight(ordinal_graph, 1.1) | |
| | pynutil.add_weight(money_graph, 1.1) | |
| | pynutil.add_weight(telephone_graph, 1.1) | |
| | pynutil.add_weight(electonic_graph, 1.1) | |
| | pynutil.add_weight(fraction_graph, 1.1) | |
| | pynutil.add_weight(range_graph, 1.1) | |
| | pynutil.add_weight(serial_graph, 1.1001) # should be higher than the rest of the classes |
I think that serial_graph's weight should be higher money_graph but it is not, so I disabled MoneyFst to get the weight from SerialFst (changed its olabel to ensure that the weight is from the best path contains SerialFst) for this text and here is the weight with corresponding SerialFst's weight in ClassifyFst.classify:
1.1000 2404.29785
1.1001 2404.29785
1.1002 2404.2981
1.1003 2404.29858
1.1004 2404.29858
1.1005 2404.29883
1.1006 2404.29883
1.1007 2404.29907
English is not my native language, so please forgive me if there is any ambiguity.