-
Notifications
You must be signed in to change notification settings - Fork 45
chore: run pre-commit in CI, rework datasets #98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
088fe83 to
5d64ff5
Compare
3419871 to
81ba904
Compare
5657d2a to
19b1047
Compare
c1964a5 to
2a000b4
Compare
6d1438c to
6d7614f
Compare
5aa25f3 to
c09a83c
Compare
28d62f9 to
c84b642
Compare
MicPie
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor things.
We can also look into the spaces thing when we create the prompts, i.e., remove them there, if it is too complicated here.
I sometimes added some words in the description, if we will not use them for the prompt than we don't need them afaik.
I'll look into fixing the stuff that is clear to me.
| --- | ||
| name: ClinTox | ||
| description: The ClinTox dataset includes drugs that have failed clinical trials for toxicity reasons and also drugs that are associated with successful | ||
| description: The ClinTox dataset includes drugs that have failed clinical trials for toxicity reasons and also drugs that are associated with successful |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(not important: remove additional spaces?)
data/ClinTox/meta.yaml
Outdated
| description: whether it can cause clinical toxicity (1) or not (0). | ||
| units: clinical_toxicity | ||
| type: categorical | ||
| type: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set the type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See also comment below?
|
|
|
Note: |
* Add `uris` field for identifiers * Linting * update valdation * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat: fix typo --------- Co-authored-by: Matthew Evans <git@ml-evs.science> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Michael Pieler <Michael.Pieler@Gmail.com>
This reverts commit 0f766e5.
This reverts commit dff4331.
src/chemnlp/data_val/model.py
Outdated
| @validator("pubchem_aids") | ||
| def uris_resolves(cls, values): | ||
| if values is not None: | ||
| for uri in values.get("uris"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kjappelbaum Because pubchem_aids: Optional[List[int]] I guess this needs to be something like for uri in values: and then we need to create the correct URL for the request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because currently we get:
.../chemnlp/src/chemnlp/data_val/model.py", line 110, in uris_resolves
for uri in values.get("uris"):
AttributeError: 'list' object has no attribute 'get'
|
@MicPie shall we merge this one and then make one more pass once this is in main? |
Since we approved
pre-commit.ciwe can now drop those steps from the GitHub actions.I also wanted to ensure that the CI on
mainpasses and went through the issues and realized that some datasets are recorded in a sub-optimal way. I also fixed those issues and also added the links to identifiers.ToDo:
update contribution guide:
fix all datasets
I will add the new fields to the schema in a separate PR to not make things too messy.