Skip to content

Bug: Failure to detect tag. Specification of unrelated tags in config somehow mask other tags. #51

@bentyeh

Description

@bentyeh

Consider this read that has a series of tags (Y18, O45, E41, O21, E2, O85) where Y, O, and E denote terminal, odd, and even tags from a SPRITE-based experiment. Note that the E2 tag has a single mismatch (substitution) from a C to A, as indicated by the orange arrow.

Image

Using the following config file, splitcode fails to detect the E2 and O85 tags:

groups ids tags distances locations previous
NYStgBot Y18 TCTCCTTACG 1:1:1 0:0:11 -
NYStgBot Y32 TGTAGTTCTA 1:1:1 0:0:11 -
OddBot3 O45 GCCTAGTAGAAGACGTT 2:2:2 0:11:0 {{NYStgBot}}4-10
EvenBot2 E41 ATAGATTGTTGCGTGCT 2:2:2 0:11:0 {{OddBot3}}4-10
OddBot2 O21 GGATAGCACCGTTCATT 1:1:1 0:11:0 {{EvenBot2}}4-10
EvenBot1 E2 TGTAGGTTCTGGAATCT 1:1:1 0:11:0 {{OddBot2}}4-10
OddBot1 O85 GCTGTGTCTGTCACCT 1:1:1 0:11:0 {{EvenBot1}}4-10
DPM_const DPM_const TCATGTCTTCCGATCT 2:0:2 0:11:0 {{OddBot1}}4-10
DPM_R2 DPM1 TGGGTGTTT 1:0:1 0:11:0 {DPM_const}0-0
# splitcode command
splitcode -c config.tsv --loc-names --out-fasta --pipe input.fastq

# output
>AV233703:20251015_Guttman_MRB:2512583691:1:10102:0992:0223 LX:Z:Y18:0,0-10,O45:0,17-34,E41:0,41-58,O21:0,65-82
TCTCCTTACGGACAACTGCCTAGTAGAAGACGTTTGACTTGATAGATTGTTGCGTGCTCACAACTGGATAGCACCGTTCATTTGACTTGTGTAGGTTCTGGAATATGACAACTGCTGTGTCTGTCACCTTTGACTTGTCA

However, upon removal of either the Y32 tag or the DPM1 tag (or both) from the config file, splitcode is able to detect the E2 and O85 tags:

# splitcode command
splitcode -c config_updated.tsv --loc-names --out-fasta --pipe input.fastq

# output
>AV233703:20251015_Guttman_MRB:2512583691:1:10102:0992:0223 LX:Z:Y18:0,0-10,O45:0,17-34,E41:0,41-58,O21:0,65-82,E2:0,89-106,O85:0,113-129
TCTCCTTACGGACAACTGCCTAGTAGAAGACGTTTGACTTGATAGATTGTTGCGTGCTCACAACTGGATAGCACCGTTCATTTGACTTGTGTAGGTTCTGGAATATGACAACTGCTGTGTCTGTCACCTTTGACTTGTCA

(That DPM_const and DPM_R2 would extend past the end of the read is irrelevant; padding the 3' end of the read with arbitrary sequences does not seem to affect detection of E2 and O85.)

Executable Google Colab notebook: https://colab.research.google.com/drive/1JbHxLV3mXtD_qv3SUagfafiiq_FdAVXV

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions