Skip to content

Misclassification when close relatives are present #63

@ana-re

Description

@ana-re

Hi @AlexanderDilthey

We spotted an issue with misclassification, particularly within the Brucella genus. We generated FASTQ files containing simulated ONT reads for 4 Brucella species and analysed them with Metamaps using a genus-level database constructed from all of the available RefSeq genomes for all Brucella species.

The results looked great when Metamaps was ran on FASTQ files containing just one species, the percent of correctly classified reads being 100% for 3 of the species and 99.95% for one of them.
However, when concatenating the 4 FASTQ files so that the input file contains all 4 of our Brucella species, the percent of correctly classified reads dropped to as low as 1.18% for one of the species, and 39.9%, 46.93%, and 99.94% for the others.

I was hoping you could please investigate this and let us know how we can improve the classification in our analysis pipeline, which incorporates Metamaps.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions