Skip to content

feat: function to return *all* matching genes/transcripts by id#201

Open
ovesh wants to merge 2 commits intomainfrom
gk-raise-error-on-multiple-transcripts-per-id-tinf-045
Open

feat: function to return *all* matching genes/transcripts by id#201
ovesh wants to merge 2 commits intomainfrom
gk-raise-error-on-multiple-transcripts-per-id-tinf-045

Conversation

@ovesh
Copy link
Contributor

@ovesh ovesh commented Mar 10, 2026

allow users to avoid errors in cases where there are multiple matches (typical but not limited to pseudoautosomal regions)

gene/transcript table getitem returns a single arbitrary match

allow users to avoid errors in cases where there are multiple
matches (typical but not limited to pseudoautosomal regions)

gene/transcript table __getitem__ returns a single arbitrary match
return mock_result(Gene)

@mock
def find_by_id(self, id): # pragma: no cover
Copy link
Collaborator

@s22chan s22chan Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type annotations on newer code? or the mock override doesn't play well with it?

realizing first_by_name is not following the conventions of the rest of the code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll check vscode and jetbrains to make sure first.

return nullptr;
}

template <typename T> // T = PyGene, PyTran -- anything that when unpacked has an 'id' member.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be in c++ or is it a hotspot?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I admit I didn't really give it much thought before implementation.

It's not a hotspot yet because it's a new API, but we'll try steering users to use the new API, to avoid the duplicate-per-id footgun. We discussed changing PyGenomeAnnoTable_GetSubscript_ByID() to raise an error on dupes but realized it's too common so is likely to break existing code.

I think this particular function is easy enough to maintain to keep as C++.

Co-authored-by: Steve Chan <32464643+s22chan@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants