Skip to content

Add support for STR prioritisation from ExpansionHunter calls #563

@julesjacobsen

Description

@julesjacobsen

ExpansionHunter is used in Genomics England for detecting these from short read sequencing. This is the example output: https://github.com/Illumina/ExpansionHunter/blob/master/docs/06_OutputVcfFiles.md#example

The following VCF entry describes the state of C9orf72 repeat in a sample with name/barcode LP6005616-DNA_A03.

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  LP6005616-DNA_A03
chr9    27573526        .       C       <STR2>,<STR349> .       PASS    SVTYPE=STR;END=27573544;REF=3;RL=18;RU=GGCCCC;REPID=ALS GT:SO:CN:CI:AD_SP:AD_FL:AD_IR   1/2:SPANNING/INREPEAT:2/349:2-2/323-376:19/0:3/6:0/459

This line tells us that first allele spans 2 repeat units while the second allele spans 349 repeat units. The repeat unit is GGCCCC (RU INFO field), so the sequence of the first allele is GGCCCCGGCCCC and the sequence of the second allele is GGCCCC x 349. The repeat spans three repeat units in the reference (REF INFO field). The length of the short allele was estimated from spanning reads (SPANNING) while the length of the expanded allele was estimated from in-repeat reads (INREPEAT). The confidence interval for the size of the expanded allele is (323,376). There are 19 spanning and 3 flanking reads consistent with the repeat allele of size 2 (that is 19 reads fully contain the repeat of size 2 and 2 flanking reads overlap at most 2 repeat units). Also, there are 6 flanking and 459 in-repeat reads consistent with the repeat allele of size 349.

PanelApp has info on the pathogenicity for STRs e.g.
https://panelapp.genomicsengland.co.uk/panels/entities/C9orf72_GGGGCC

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions