Version 1.2 - January 10, 2026
By Alan Rockefeller
A tool for automatically fixing the orientation of fungal ITS sequences in FASTA files. Perfect for cleaning up sequences downloaded from databases like MycoMap or GenBank where some sequences might be reverse-complemented.
Ever downloaded a bunch of ITS sequences only to find that some are in the wrong orientation, making your phylogenetic tree completely useless until you manually reverse-complement them? This tool automatically detects and fixes that problem by looking for conserved motifs in the ITS region. It flips sequences that are backwards, giving you a clean FASTA file ready for phylogenetic analysis.
- Smart orientation detection using three conserved ITS motifs (ITS1-F, 5.8S core, ITS4)
- Conservative Reversal Gate: Sequences are only reversed if there is strong evidence (core + flank presence and high-quality core hit)
- IUPAC-aware fuzzy matching - handles ambiguous nucleotides
- Robustness: 'N' and '-' are treated as mismatches to prevent spurious hits in low-quality regions
- Automatic repair of malformed FASTA files with stray '>' symbols
- Detailed reporting of which sequences were reversed (or silent operation with
-q) - Fast and efficient - processes large files quickly using a high-performance sliding window scan
- Cross-platform consistent: Ensures UTF-8 encoding and Unix line endings (\n) for all output
Just download the script and make it executable:
wget https://raw.githubusercontent.com/AlanRockefeller/fixfasta/main/fixfasta.py
chmod +x fixfasta.pyRequirements:
- Python 3.7+
- No external dependencies - uses only Python standard library
Basic usage - fix orientation and see what was changed:
./fixfasta.py sequences.fasta > fixed_sequences.fastaThe script will report which sequences it reversed:
=== Reversed sequences (2) ===
DQ422012_Russula_ochrospora
iNat180216325_Russula_sp
Process multiple files:
cat *.fasta | ./fixfasta.py > all_fixed.fastaSilent mode (no reports):
./fixfasta.py input.fasta -q > output.fastaSee detailed statistics:
./fixfasta.py input.fasta --stats > output.fastaVerbose mode to understand decisions:
./fixfasta.py input.fasta -v > output.fastaDry run (analyze without modifying):
./fixfasta.py input.fasta --dry-runSave to a specific output file:
./fixfasta.py input.fasta -o output.fasta-h, --help Show help message
-o, --output Output file (default: stdout)
-n, --dry-run Don\'t write output, just analyze
-v, --verbose Verbose output showing decision process
-s, --stats Print orientation statistics
--stats-only Only print statistics, no sequence output
-q, --quiet Suppress all warnings and reports
--max-mismatches N Maximum mismatches per motif (default: 4).
Note: Only substitutions are counted (no indels).
The tool uses three conserved motifs commonly found in fungal ITS sequences:
- ITS1-F (TCCGTAGGTGAACCTGCGG) - found at the 18S end
- 5.8S core (GCATCGATGAAGAACGCAGC) - middle region
- ITS4 (TCCTCCGCTTATTGATATGC) - found at the 28S start
For each sequence, it:
- Searches for these motifs in both orientations (forward and reverse-complement).
- Uses a fast sliding window to count mismatches (substitutions only).
- Primary Winner:
- Which orientation has more distinct motif hits.
- If tied, which has fewer total mismatches.
- If still tied, which has the earliest best hit.
- Conservative Reversal Gate: Even if "reverse" is the primary winner, the sequence is only reversed if:
- It contains a hit for the 5.8S core AND at least one flanking motif (ITS1-F or ITS4).
- The 5.8S core hit in the reverse orientation is high quality (≤ 2 mismatches).
- The motifs appear in the correct relative order (ITS4-rev before core, ITS1F-rev after core).
- If the reversal gate fails, it stays "forward" (or "uncertain" if no hits at all were found).
The fuzzy matching understands IUPAC ambiguity codes (R, Y, S, W, K, M, etc.). Note that 'N' and '-' are always treated as mismatches to avoid false positives in low-quality sequence data.
This tool was originally created to process ITS sequences downloaded from MycoMap for phylogenetic tree construction. It's particularly useful when combining sequences from multiple sources where orientation consistency isn't guaranteed.
Example workflow:
# Download sequences from MycoMap
# ... download process ...
# Fix orientations
./fixfasta.py mycomap.fa ncbi.fa > sequences_oriented.fas
# Now ready for MAFFT alignment, RAxML, etc.
mafft sequences_oriented.fasta > aligned.fasta- The default behavior shows you what was changed - use
-qfor silent operation in pipelines. - Use
--dry-runfirst on new datasets to see what would be changed. - The tool preserves sequence names exactly.
- Handles messy FASTA files gracefully (like those with stray '>' symbols).
- All diagnostic output goes to stderr, so stdout piping remains clean.
Version 1.0 – June 30, 2025
A tiny helper that grabs the FASTA files behind a MycoMap BLAST result page – no clicking, no copy‑and‑paste, just the files on disk ready to go. As of v1.1 it also tells you how many sequences were retrieved.
- Pulls the NCBI‑side FASTA (
ncbi_<ID>.fasta) and the local MycoMap FASTA (myco_<ID>.fasta) for a given BLAST job. - Prints the download time, file size and sequence count for each file.
- Names everything with the BLAST numeric ID.
python getfasta.py https://mycomap.com/genetics/blast-search/a04-inat237420128-1-ric77-332392-r265167/Output looks like this:
Downloading FASTA files for MycoBLAST ID: 265167
NCBI downloaded in 2.85s (74953 bytes, 97 sequences)
MycoBLAST downloaded in 1.02s (36161 bytes, 50 sequences)
This project is licensed under the MIT License:
MIT License
Copyright (c) 2026 Alan Rockefeller
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Found a bug? Have a suggestion? Feel free to open an issue or submit a pull request!
https://github.com/AlanRockefeller/fixfasta.py
Thanks to the mycological community for providing the data that made this tool necessary, and to everyone who's contributed sequences to public databases.