It uses some data-scientisty things to compare text files for duplicate, reudant text. It does not use simple text comparison, it tries to do semantic comparison, so it can create false positives, and false positives, so always check the output by hand.
pip install simtextsimtext myfiles