Remove punct normalizer?

We are using Moses punct normalizer for scoring [here](https://github.com/bitextor/monocleaner/blob/7402163507de6646875d1130202727b201243068/src/monocleaner/lm.py#L131) and it is taking 60% of the scoring process. It is there because in the SMT times, we were aplying it to everything, but maybe for this scenario it doesn't affect too much. Or, if we still want to normalize punctuation in order to reduce unks, could we make a much faster implementation regex based? Right now feels like a high time cost compared to the benefits that is giving.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove punct normalizer? #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Remove punct normalizer? #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions