We are using Moses punct normalizer for scoring here and it is taking 60% of the scoring process. It is there because in the SMT times, we were aplying it to everything, but maybe for this scenario it doesn't affect too much. Or, if we still want to normalize punctuation in order to reduce unks, could we make a much faster implementation regex based? Right now feels like a high time cost compared to the benefits that is giving.
We are using Moses punct normalizer for scoring here and it is taking 60% of the scoring process. It is there because in the SMT times, we were aplying it to everything, but maybe for this scenario it doesn't affect too much. Or, if we still want to normalize punctuation in order to reduce unks, could we make a much faster implementation regex based? Right now feels like a high time cost compared to the benefits that is giving.