At the moment in Serval, we preprocess each parallel corpus independently. This can result in unexpected behavior if multiple parallel corpora reference the same monolingual corpora/files. For example, a book could be drafted which was incorporated in the training data in a separate parallel corpus. We could merge at the monolingual corpus or file level.