SpliceMap

Simple pipeline for annotating splicing regulatory elements on genomic DNA sequences. Given a GenBank file with exon annotations (or an mRNA accession to discover them), splicemap maps splice sites, branch points, polypyrimidine tracts, and exonic splicing enhancers and silencers with the best available tools. Annotations are color-coded and written back to the GenBank file for viewing in SnapGene or UGENE. A markdown report is generated alongside.

Quick Start

git clone https://github.com/maxwraae/splicemap.git
cd splicemap
pip install -r requirements.txt
# Download MECP2 RefSeqGene from NCBI (NG_007107), then:
python splicemap.py splicemap NG_007107.gb -t NM_004992.4

Input

Download your gene as a RefSeqGene from NCBI Gene. These files come with exon annotations already included, which splicemap uses to find introns. If your GenBank file doesn't have exon annotations, pass an mRNA transcript accession with -t (e.g. -t NM_004992.4) and splicemap will discover them by alignment.

What You See

Annotations are named by what they are, not which tool found them. Colors group by function. Shades indicate relative confidence. Double-click any annotation in SnapGene to see the tool, score, motif, and other details.

Annotation	Color	What it is
Branch Point	Orange (dark to light by rank)	Candidate branch point adenosine
Polypyrimidine Tract	Amber	Pyrimidine-rich region between BPS and 3'SS
5' Splice Site	Teal	Donor splice site (GT)
3' Splice Site	Teal (lighter)	Acceptor splice site (AG)
Splice Enhancer (blue)	Blue (shades)	SR protein binding site (ESEfinder)
Splice Enhancer (green)	Green	Hexamer with positive splicing activity (ESRseq)
Splice Silencer (red)	Red (shades)	Hexamer or motif that suppresses exon inclusion
U2AF65 binding site	Darker amber	Predicted 9-nt U2AF65 binding register within PPT

MECP2 intron 2 / exon 3 junction. Branch points (orange), PPT (amber), 3'SS (teal), splice enhancers (blue = ESEfinder, green = ESRseq), splice silencers (red).

Double-click any annotation to see the tool, protein, score, and position.

Example Output

Splice Map: MECP2_CS (76,145 bp, linear)
============================================================

Intron                 Length     5'SS     3'SS   BPS(z)   PPT%  U-run
-------------------- -------- -------- -------- -------- ------ ------
intron 1               5,296      7.9     10.8      6.5    80%      4
intron 2              59,626     10.9      5.3      6.1    67%      3
intron 3                 756     10.1     12.4      6.6    80%      3

Exon                 Length  ESEfinder  hnRNP  ESRseq+  ESRseq-
-------------------- ------  ---------  -----  -------  -------
exon_us_intron_1       114         25      0       63       13
exon_ds_intron_1       124         22      0       24       37
exon_ds_intron_2       351         57      1      118       69
exon_ds_intron_3      9878       1436     55     2232     3125

Methods

Splice sites

MaxEntScan (Yeo & Burge 2004). 5'SS scored on a 9-mer (3 exonic + 6 intronic), 3'SS on a 23-mer (20 intronic + 3 exonic). Log-odds scores. Above 6 is strong, 3-6 moderate, below 3 weak. Non-canonical dinucleotides are flagged.

Branch points

BPP (PWM trained on verified human branch points) and SVM-BPfinder (SVM classifier). Both run independently on each intron. Up to 4 candidates shown, ranked by score across both tools. Darker orange = higher confidence.

Branch point prediction is roughly 75-80% accurate. No tool reliably identifies the correct branch point across all intron contexts.

Polypyrimidine tract

Defined as the region between the top branch point candidate and the 3'SS AG. Reports length, pyrimidine percentage, and longest uninterrupted U-run. The U-run is the most informative single feature for U2AF65 binding (crystal structures show its two RRM domains each grab 4-5 uridines).

No validated computational model for U2AF65 binding affinity exists. PPT is scored by composition, which is standard practice. The PPT window depends on the branch point prediction.

U2AF65 binding site prediction. Within the PPT, splicemap predicts the optimal 9-nucleotide register where U2AF65's tandem RRM domains bind. Scoring uses nucleotide-level log-odds derived from U2AF65 SELEX composition (Banerjee et al. 2003): uridine is strongly preferred (log-odds +0.94), cytosine slightly preferred (+0.11), purines penalized (-1.83). RRM2 positions (5' end of footprint) are weighted 1.5x because RRM2 makes more sequence-specific contacts (Sickmier et al. 2006). This is an approximation; the full S65 pentamer table (Erkelenz et al. 2008) would provide dinucleotide context effects.

Exonic splicing enhancers and silencers

Two methods, measuring different things.

ESEfinder (Cartegni et al. 2003). Position weight matrices from SELEX experiments for four SR proteins: SRSF1, SRSF2, SRSF5, SRSF6. Tells you which protein binds where. Only covers 4 of ~12 SR proteins. In vitro binding preference does not always match in vivo function. ~44% accuracy on known splicing mutations.

ESRseq (Ke et al. 2011). All 4,096 possible hexamers tested in a minigene assay and scored by RNA-seq. Positive score = promotes exon inclusion (enhancer). Negative = promotes skipping (silencer). Captures the combined effect of all proteins that bind a given sequence. Does not identify which protein is responsible. ~83% accuracy on known splicing mutations.

hnRNP motifs. Pattern matching for hnRNP A1 (Burd & Dreyfuss 1994) and hnRNP H G-runs (Caputi & Bhatt 2003).

Limitations

Flat sequence only. No RNA secondary structure.
No positional weighting (ESEs near splice sites matter more than those mid-exon).
No combinatorial effects between adjacent elements.
No cell-type or tissue specificity.
Branch point prediction accuracy is inherently limited. PPT analysis depends on it.

Commands

Reading and inspection

Command	Description
`read <file>`	Parse .gb/.fasta, show summary
`features <file>`	List all annotations
`seq <file> <start> <end>`	Extract sequence (1-based)
`translate <file> <start> <end>`	Translate a region
`search <file> <sequence>`	Find motif occurrences (both strands)
`orfs <file>`	Find open reading frames
`sites <file>`	Find restriction sites
`open <file>`	Open in default viewer

Annotation

Command	Description
`splicemap <file> -t <accession>`	Full splice map
`exons <file> -t <accession>`	Find and annotate exon boundaries
`annotate <file> <start> <end> <label>`	Add a feature
`annotate-seq <file> <sequence> <label>`	Find and annotate a sequence
`splice-signals <file>`	Annotate splice signals on detected introns
`branchpoint <file>`	Predict branch points
`remove <file> <label>`	Remove an annotation

Sequence editing

Command	Description
`insert <file> <pos> <seq>`	Insert sequence, shift features
`delete <file> <start> <end>`	Delete region, shift features
`replace <file> <start> <end> <seq>`	Replace region
`revcomp <file>`	Reverse complement

Analysis

Command	Description
`diff <file1> <file2>`	Compare two constructs
`blast <file>`	Remote NCBI BLAST
`stitch <file> [labels...]`	Stitch regions, optionally translate
`check <file>`	Preflight validation
`gibson <file> --enzymes E1,E2 --insert SEQ`	Design Gibson assembly
`varmap <file> <variants_csv>`	Map variant positions
`export <file> <format>`	Convert format (fasta, genbank, tab)

Dependencies

Python 3.8+
pip install -r requirements.txt  # biopython, pydna

Branch point tools (BPP, SVM-BPfinder) are downloaded automatically on first use.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
images		images
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
splicemap.py		splicemap.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpliceMap

Quick Start

Input

What You See

Example Output

Methods

Splice sites

Branch points

Polypyrimidine tract

Exonic splicing enhancers and silencers

Limitations

Commands

Reading and inspection

Annotation

Sequence editing

Analysis

Dependencies

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SpliceMap

Quick Start

Input

What You See

Example Output

Methods

Splice sites

Branch points

Polypyrimidine tract

Exonic splicing enhancers and silencers

Limitations

Commands

Reading and inspection

Annotation

Sequence editing

Analysis

Dependencies

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages