downsample_chua

This python script iterates through one of the inputted fasta files to find how many records there are. The total number of ZMW records to record will be four times that number multipled by the percentage of files to retain. In our case, it was 75%.

The python script then loops through the fasta files and retains the longest ZMW reads. It does this by storing the length information in a dictionary with the ZMW numbers as the values. A counter keeps track of the number of the ZMW seen. Once the counter reaches the threshold described above, the shortest reads are removed. If a ZMW is seen more than once, the longest read is retained.

NOTE: This script has not been tested as we decided to move forward with the C++ script created by Jonas.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
downsample_longest.py		downsample_longest.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

downsample_chua

About

Uh oh!

Releases

Packages

Languages

crab-assemblathon3/downsample_chua

Folders and files

Latest commit

History

Repository files navigation

downsample_chua

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages