-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Consider this read that has a series of tags (Y18, O45, E41, O21, E2, O85) where Y, O, and E denote terminal, odd, and even tags from a SPRITE-based experiment. Note that the E2 tag has a single mismatch (substitution) from a C to A, as indicated by the orange arrow.
Using the following config file, splitcode fails to detect the E2 and O85 tags:
| groups | ids | tags | distances | locations | previous |
|---|---|---|---|---|---|
| NYStgBot | Y18 | TCTCCTTACG | 1:1:1 | 0:0:11 | - |
| NYStgBot | Y32 | TGTAGTTCTA | 1:1:1 | 0:0:11 | - |
| OddBot3 | O45 | GCCTAGTAGAAGACGTT | 2:2:2 | 0:11:0 | {{NYStgBot}}4-10 |
| EvenBot2 | E41 | ATAGATTGTTGCGTGCT | 2:2:2 | 0:11:0 | {{OddBot3}}4-10 |
| OddBot2 | O21 | GGATAGCACCGTTCATT | 1:1:1 | 0:11:0 | {{EvenBot2}}4-10 |
| EvenBot1 | E2 | TGTAGGTTCTGGAATCT | 1:1:1 | 0:11:0 | {{OddBot2}}4-10 |
| OddBot1 | O85 | GCTGTGTCTGTCACCT | 1:1:1 | 0:11:0 | {{EvenBot1}}4-10 |
| DPM_const | DPM_const | TCATGTCTTCCGATCT | 2:0:2 | 0:11:0 | {{OddBot1}}4-10 |
| DPM_R2 | DPM1 | TGGGTGTTT | 1:0:1 | 0:11:0 | {DPM_const}0-0 |
# splitcode command
splitcode -c config.tsv --loc-names --out-fasta --pipe input.fastq
# output
>AV233703:20251015_Guttman_MRB:2512583691:1:10102:0992:0223 LX:Z:Y18:0,0-10,O45:0,17-34,E41:0,41-58,O21:0,65-82
TCTCCTTACGGACAACTGCCTAGTAGAAGACGTTTGACTTGATAGATTGTTGCGTGCTCACAACTGGATAGCACCGTTCATTTGACTTGTGTAGGTTCTGGAATATGACAACTGCTGTGTCTGTCACCTTTGACTTGTCA
However, upon removal of either the Y32 tag or the DPM1 tag (or both) from the config file, splitcode is able to detect the E2 and O85 tags:
# splitcode command
splitcode -c config_updated.tsv --loc-names --out-fasta --pipe input.fastq
# output
>AV233703:20251015_Guttman_MRB:2512583691:1:10102:0992:0223 LX:Z:Y18:0,0-10,O45:0,17-34,E41:0,41-58,O21:0,65-82,E2:0,89-106,O85:0,113-129
TCTCCTTACGGACAACTGCCTAGTAGAAGACGTTTGACTTGATAGATTGTTGCGTGCTCACAACTGGATAGCACCGTTCATTTGACTTGTGTAGGTTCTGGAATATGACAACTGCTGTGTCTGTCACCTTTGACTTGTCA
(That DPM_const and DPM_R2 would extend past the end of the read is irrelevant; padding the 3' end of the read with arbitrary sequences does not seem to affect detection of E2 and O85.)
Executable Google Colab notebook: https://colab.research.google.com/drive/1JbHxLV3mXtD_qv3SUagfafiiq_FdAVXV