Skip to content

multiple connected components in mondo ontology  #495

@kmanpearl

Description

@kmanpearl

I was expecting the processed mondo ontology to only have 1 connected component but this is not the case. I am not sure if this is because I am misunderstanding the processing steps, I am missing some function call/argument in my code (shown below), or a bug.

from obnb.data import MondoDiseaseOntology
root = '../data/obnb/FullyRedundant'
dat = MondoDiseaseOntology(root=root)
g = dat.data
undirected = g.to_undirected_sparse_graph()
len(undirected.connected_components())
# 3136

There were 2 connected components, one with ~23k nodes and one with 41 nodes, and the rest of the ~3k components are nodes with no edges in the ontology.

This caused a further problem because the term MONDO:0006560 has gene annotations in obnb but no ontology edges, thus when using an edge list to create node embeddings it is not considered part of the ontology. I had to manually remove this term from the gene set collection before I could use my net2onto method with mondo.

If I am misunderstanding and this is not a feature that is implemented, then can we please add a feature that filters ontologies to only contain the largest connected component? Or fix it if it is a bug? And if I am just missing something in my code then please let me know what the proper way to process the ontology is.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions