This python script iterates through one of the inputted fasta files to find how many records there are. The total number of ZMW records to record will be four times that number multipled by the percentage of files to retain. In our case, it was 75%.
The python script then loops through the fasta files and retains the longest ZMW reads. It does this by storing the length information in a dictionary with the ZMW numbers as the values. A counter keeps track of the number of the ZMW seen. Once the counter reaches the threshold described above, the shortest reads are removed. If a ZMW is seen more than once, the longest read is retained.
NOTE: This script has not been tested as we decided to move forward with the C++ script created by Jonas.