Skip to content

Conversation

@ml-evs
Copy link
Contributor

@ml-evs ml-evs commented Mar 7, 2023

This PR quickly mocks up the addition of URIs to dataset target metadata. Somewhat addresses #8 --- need to decide how useful this is first.

The idea is that these URIs can at least i) resolve whether two definitions are the same across datasets and could potentially ii) be used to augment the dataset with canonical descriptions and semantic links, either during prep or by the model on-the-fly.

This currently assumes an "exact match" style mapping between target and property -- we could build in additional semantic context in the schema here to enable things like related identities/subclasses/parthood and all that jazz. I struggled with the Butkiewicz sets as it is really outside my field and definitions are available for e.g., cav3, t-type, calcium channel and activity but not activity_cav3_t_type_calcium_channels.

As discussed, this is quite a niche task that may not be suitable to ask others to perform. Even in my own case, it is not clear exactly how good these particular definitions are -- I just went via BioPortal for fields that have good matches: https://bioportal.bioontology.org/

@ml-evs ml-evs marked this pull request as ready for review March 7, 2023 18:27
@ml-evs
Copy link
Contributor Author

ml-evs commented Mar 7, 2023

Potential further changes before merging:

  • Use "CURIE"s instead of URIs that pin the definitions to a set of annointed prefixes/ontologies that we want to encourage, e.g., bioassay ontology, IUPAC gold book
  • Write/use pydantic validator that checks whether links are up (could even check if they serve e.g., JSON-LD at that link and see if a definition can be extracted...)

@kjappelbaum
Copy link
Collaborator

Somehow pushing to this PR opened another PR #103. Let's proceed with this one, as it also has the history of Matthew's commits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consider adding URIs for dataset targets and properties

2 participants