Skip to content

Conversation

@kjappelbaum
Copy link
Collaborator

@kjappelbaum kjappelbaum commented Mar 11, 2023

Since we approved pre-commit.ci we can now drop those steps from the GitHub actions.

I also wanted to ensure that the CI on main passes and went through the issues and realized that some datasets are recorded in a sub-optimal way. I also fixed those issues and also added the links to identifiers.

ToDo:

  • update contribution guide:

    • add groups for train/test split
    • ideally retrieve data from original source in script
    • do not simply copy/paste descriptions
    • pubchem assay ids
    • uris
  • fix all datasets

I will add the new fields to the schema in a separate PR to not make things too messy.

@kjappelbaum kjappelbaum linked an issue Mar 11, 2023 that may be closed by this pull request
@kjappelbaum kjappelbaum force-pushed the kjappelbaum/issue82 branch from 088fe83 to 5d64ff5 Compare March 11, 2023 14:15
@kjappelbaum kjappelbaum force-pushed the kjappelbaum/issue82 branch from 3419871 to 81ba904 Compare March 11, 2023 17:31
@kjappelbaum kjappelbaum force-pushed the kjappelbaum/issue82 branch from 5657d2a to 19b1047 Compare March 11, 2023 17:54
@kjappelbaum kjappelbaum force-pushed the kjappelbaum/issue82 branch from c1964a5 to 2a000b4 Compare March 11, 2023 18:04
@kjappelbaum kjappelbaum force-pushed the kjappelbaum/issue82 branch from 6d1438c to 6d7614f Compare March 11, 2023 18:12
@kjappelbaum kjappelbaum force-pushed the kjappelbaum/issue82 branch from 5aa25f3 to c09a83c Compare March 13, 2023 07:19
@kjappelbaum kjappelbaum force-pushed the kjappelbaum/issue82 branch from 28d62f9 to c84b642 Compare March 13, 2023 08:11
@kjappelbaum kjappelbaum requested a review from MicPie March 13, 2023 14:31
Copy link
Contributor

@MicPie MicPie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor things.
We can also look into the spaces thing when we create the prompts, i.e., remove them there, if it is too complicated here.
I sometimes added some words in the description, if we will not use them for the prompt than we don't need them afaik.
I'll look into fixing the stuff that is clear to me.

---
name: ClinTox
description: The ClinTox dataset includes drugs that have failed clinical trials for toxicity reasons and also drugs that are associated with successful
description: The ClinTox dataset includes drugs that have failed clinical trials for toxicity reasons and also drugs that are associated with successful
Copy link
Contributor

@MicPie MicPie Mar 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not important: remove additional spaces?)

description: whether it can cause clinical toxicity (1) or not (0).
units: clinical_toxicity
type: categorical
type:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set the type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also comment below?

@MicPie
Copy link
Contributor

MicPie commented Mar 13, 2023

data/kcnq2_potassium_channel_butkiewicz/meta.yaml results in an error because units are not set.
Do we need to add Optional[str] to https://github.com/OpenBioML/chemnlp/blob/main/src/chemnlp/data_val/model.py#L53 ?
See also #98 (comment)

@MicPie
Copy link
Contributor

MicPie commented Mar 14, 2023

Note: : in the text breaks the commit hooks for the yaml 😱

MicPie and others added 17 commits March 14, 2023 14:48
* Add `uris` field for identifiers

* Linting

* update valdation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* feat: fix typo

---------

Co-authored-by: Matthew Evans <git@ml-evs.science>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michael Pieler <Michael.Pieler@Gmail.com>
@validator("pubchem_aids")
def uris_resolves(cls, values):
if values is not None:
for uri in values.get("uris"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kjappelbaum Because pubchem_aids: Optional[List[int]] I guess this needs to be something like for uri in values: and then we need to create the correct URL for the request?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because currently we get:

.../chemnlp/src/chemnlp/data_val/model.py", line 110, in uris_resolves
    for uri in values.get("uris"):
AttributeError: 'list' object has no attribute 'get'

@kjappelbaum
Copy link
Collaborator Author

@MicPie shall we merge this one and then make one more pass once this is in main?

@MicPie MicPie self-requested a review March 16, 2023 07:26
@MicPie MicPie merged commit 57ce6b0 into main Mar 16, 2023
@MicPie MicPie deleted the kjappelbaum/issue82 branch March 16, 2023 07:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Run pre-commit in CI

3 participants