Replace the use of bidsschematools with the use of deno-compiled BIDS validator in obtaining validation result#1599
Conversation
7e4d870 to
0ca8b1f
Compare
|
I wonder if while at this we should add an option which would pretty much list which validators are to be used or not used. This way we could still keep |
I am not sure I understand exactly what you are proposing. Are these the enums you are referring to. They are validators for different standards not just for BIDS, so I don't know what you mean by having them as alternatives to For as far as I know, there is currently only one use of I think we can talk about this more in tomorrow's meeting with some clarifications. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1599 +/- ##
==========================================
+ Coverage 88.29% 88.78% +0.49%
==========================================
Files 78 82 +4
Lines 11045 11399 +354
==========================================
+ Hits 9752 10121 +369
+ Misses 1293 1278 -15
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
f6ebc3e to
0505ffc
Compare
dandi/files/bids.py
Outdated
| # end of ad-hoc fix. | ||
|
|
||
| results = validate_bids(self.bids_root) | ||
| results = bids_validate(self.bids_root) |
There was a problem hiding this comment.
so here, if input path was not the top of the bids dataset -- issue a lgr.WARNING that we will use bidsschematools validation only per each path, and it would validate only the path names. To validate full bids dataset -- point to the top of the bids dataset.
There was a problem hiding this comment.
note: here we might not already have a clue what were those original paths! ...
fb5e0ea to
023ab9c
Compare
6fb2f5b to
23b796c
Compare
|
@CodyCBakerPhD I have completed the refactoring you suggested. This PR is ready for another look. The tests are failing mostly because of https://ontobee.org/ is currently half dead. |
This is good contextual information, I will keep an eye for how the metadata forms differ Might be worth renaming the PR then to focus on the 'new feature addition' since there is, after all, no replacement (hence I was confused where the replacing was 😆) I will try this hands on tomorrow and come back with anything of note |
|
From @candleindark
|
| def _get_metadata(self) -> None: | ||
| """ | ||
| Get metadata for all assets in the dataset | ||
|
|
||
| This populates `self._asset_metadata` | ||
| """ | ||
| with self._lock: | ||
| if self._dataset_errors is None: | ||
| if self._asset_metadata is None: | ||
| # Import here to avoid circular import | ||
| from dandi.validate import validate_bids |
There was a problem hiding this comment.
Oh my, OK so to my understanding, _get_metadata still relies on the old dandi.validate.validate_bids?
Used by get_asset_metadata at https://github.com/dandi/dandi-cli/pull/1599/files#diff-70a4bb759311189af6f4468d78254d863319fc4660ebe437060b6634c5a7b724L106
And now new _validate method (https://github.com/dandi/dandi-cli/pull/1599/files#diff-70a4bb759311189af6f4468d78254d863319fc4660ebe437060b6634c5a7b724R109) called by previous (and still) get_validation_errors at line https://github.com/dandi/dandi-cli/pull/1599/files#diff-70a4bb759311189af6f4468d78254d863319fc4660ebe437060b6634c5a7b724R151 is the 'replacement' meant here
There was a problem hiding this comment.
Is that an apt summary of the 'integration' of the new feature?
There was a problem hiding this comment.
Oh my, OK so to my understanding,
_get_metadatastill relies on the olddandi.validate.validate_bids?
Yes. There is no equivalent metadata provided by the deno-compiled validator, so we are keeping the bidsschematools to provide the same kind of metadata.
Is that an apt summary of the 'integration' of the new feature?
Yes, that's the gist of it. I found the current design of validation (metadata gathering as well) in DANDI complex. This is a method of integration that I found that adhere to the current design.
@yarikoptic may have some plan for the validation process in dandi-cli in the future. What I see is that any design change in the validation process would entail significant changes in other parts of the program.
|
From PR description
There do not seem to be any left? (good thing) |
|
Ran this on the now published Dandiset Details[BIDS.SIDECAR_KEY_RECOMMENDED] ...
[BIDS.SIDECAR_KEY_RECOMMENDED] E:\test_bids_validator\001350\sub-375\micr\sub-375_sample-0008_TEM.png — A data file's JSON sidecar is missing a key listed as recommended.
subCode: SampleStaining
issueMessage: Field description: Description(s) of the tissue sample staining (for example: `"Osmium"`).
MAY be an array of strings if different stains are used in each channel of the file
(for example: `["LFB", "PLP"]`).
[BIDS.SIDECAR_KEY_RECOMMENDED] E:\test_bids_validator\001350\sub-375\micr\sub-375_sample-0008_TEM.png — A data file's JSON sidecar is missing a key listed as recommended.
subCode: SamplePrimaryAntibody
issueMessage: Field description: Description(s) of the primary antibody used for immunostaining.
Either an [RRID](https://rrid.site) or the name, supplier and catalog
number of a commercial antibody.
For non-commercial antibodies either an [RRID](https://rrid.site) or the
host-animal and immunogen used (for examples: `"RRID:AB_2122563"` or
`"Rabbit anti-Human HTR5A Polyclonal Antibody, Invitrogen, Catalog # PA1-2453"`).
MAY be an array of strings if different antibodies are used in each channel of the file.
[BIDS.SIDECAR_KEY_RECOMMENDED] E:\test_bids_validator\001350\sub-375\micr\sub-375_sample-0008_TEM.png — A data file's JSON sidecar is missing a key listed as recommended.
subCode: SampleSecondaryAntibody
issueMessage: Field description: Description(s) of the secondary antibody used for immunostaining.
Either an [RRID](https://rrid.site) or the name, supplier and catalog
number of a commercial antibody.
For non-commercial antibodies either an [RRID](https://rrid.site) or the
host-animal and immunogen used (for examples: `"RRID:AB_228322"` or
`"Goat anti-Mouse IgM Secondary Antibody, Invitrogen, Catalog # 31172"`).
MAY be an array of strings if different antibodies are used in each channel of the file.
[BIDS.MISSING_REQUIRED_ENTITY] E:\test_bids_validator\001350\sub-375\micr\sub-375_TEM.json — Missing required entity for files with this suffix.
issueMessage: sample missing from rule rules.files.raw.micr.microscopyWith a ballpark total of many dozen such messages repeated across various assets and none of these repeated recommendations being aggregated (and thus rather hard to read) Will try again for comparison with last official release of CLI |
That file should be dandi-cli/dandi/bids_validator_deno/_validator.py Lines 345 to 346 in cde85ab @yarikoptic What's your opinion on that todo? The default behavior of the BIDS validator itself is to skip all the ignored issues on the regular, human facing, output, but include the ignored issues on JSON output. This PR currently skips all the ignore issues, not packing them up as dandi-cli/dandi/validate_types.py Lines 109 to 130 in cde85ab |
The output looks about right. Running out.json{
"issues": {
"issues": [
{
"code": "NOT_INCLUDED",
"location": "/dandiset.yaml",
"severity": "error"
},
{
"code": "MISSING_REQUIRED_ENTITY",
"location": "/sub-372/micr/sub-372_TEM.json",
"issueMessage": "sample missing from rule rules.files.raw.micr.microscopy",
"rule": "rules.files.raw.micr.microscopy",
"severity": "error"
},
{
"code": "MISSING_REQUIRED_ENTITY",
"location": "/sub-367A/micr/sub-367A_TEM.json",
"issueMessage": "sample missing from rule rules.files.raw.micr.microscopy",
"rule": "rules.files.raw.micr.microscopy",
"severity": "error"
},
{
"code": "MISSING_REQUIRED_ENTITY",
"location": "/sub-375/micr/sub-375_TEM.json",
"issueMessage": "sample missing from rule rules.files.raw.micr.microscopy",
"rule": "rules.files.raw.micr.microscopy",
"severity": "error"
},
{
"code": "MISSING_REQUIRED_ENTITY",
"location": "/sub-374/micr/sub-374_TEM.json",
"issueMessage": "sample missing from rule rules.files.raw.micr.microscopy",
"rule": "rules.files.raw.micr.microscopy",
"severity": "error"
},
{
"code": "MISSING_REQUIRED_ENTITY",
"location": "/sub-373C/micr/sub-373C_TEM.json",
"issueMessage": "sample missing from rule rules.files.raw.micr.microscopy",
"rule": "rules.files.raw.micr.microscopy",
"severity": "error"
},
{
"code": "MISSING_REQUIRED_ENTITY",
"location": "/sub-366A/micr/sub-366A_TEM.json",
"issueMessage": "sample missing from rule rules.files.raw.micr.microscopy",
"rule": "rules.files.raw.micr.microscopy",
"severity": "error"
},
{
"code": "MISSING_REQUIRED_ENTITY",
"location": "/sub-369B/micr/sub-369B_TEM.json",
"issueMessage": "sample missing from rule rules.files.raw.micr.microscopy",
"rule": "rules.files.raw.micr.microscopy",
"severity": "error"
},
{
"code": "MISSING_REQUIRED_ENTITY",
"location": "/sub-368A/micr/sub-368A_TEM.json",
"issueMessage": "sample missing from rule rules.files.raw.micr.microscopy",
"rule": "rules.files.raw.micr.microscopy",
"severity": "error"
},
{
"code": "MISSING_REQUIRED_ENTITY",
"location": "/sub-371/micr/sub-371_TEM.json",
"issueMessage": "sample missing from rule rules.files.raw.micr.microscopy",
"rule": "rules.files.raw.micr.microscopy",
"severity": "error"
},
{
"code": "MISSING_REQUIRED_ENTITY",
"location": "/sub-370/micr/sub-370_TEM.json",
"issueMessage": "sample missing from rule rules.files.raw.micr.microscopy",
"rule": "rules.files.raw.micr.microscopy",
"severity": "error"
}
],
"codeMessages": {
"NOT_INCLUDED": "Files with such naming scheme are not part of BIDS specification. This error is most commonly caused by typos in file names that make them not BIDS compatible. Please consult the specification and make sure your files are named correctly. If this is not a file naming issue (for example when including files not yet covered by the BIDS specification) you should include a \".bidsignore\" file in your dataset (see https://github.com/bids-standard/bids-validator#bidsignore for details). Please note that derived (processed) data should be placed in /derivatives folder and source data (such as DICOMS or behavioural logs in proprietary formats) should be placed in the /sourcedata folder.",
"MISSING_REQUIRED_ENTITY": "Missing required entity for files with this suffix."
}
},
"summary": {
"sessions": [],
"subjects": [
"372",
"367A",
"375",
"374",
"373C",
"366A",
"369B",
"368A",
"371",
"370"
],
"subjectMetadata": [
{
"participantId": "366A"
},
{
"participantId": "367A"
},
{
"participantId": "368A"
},
{
"participantId": "369B"
},
{
"participantId": "370"
},
{
"participantId": "371"
},
{
"participantId": "372"
},
{
"participantId": "373C"
},
{
"participantId": "374"
},
{
"participantId": "375"
}
],
"tasks": [],
"modalities": [
"micr"
],
"secondaryModalities": [],
"totalFiles": 103,
"size": 1033321299,
"dataProcessed": false,
"pet": {},
"dataTypes": [
"micr"
],
"schemaVersion": "1.0.4"
}
}Each error issue is mapped to a validation error, red, in dandi-cli/dandi/bids_validator_deno/_validator.py Lines 37 to 40 in cde85ab BIDS validator generates many more errors than |
Yes, of course - and agreed - I am merely thinking aloud when I note such things Upon testing validation on EDIT: reading
Indeed checks out. Anything is greater than zero, after all |
|
LGTM Code is very well organized, well written, and well documented! Great job @candleindark Just need @yarikoptic to weigh in on severity level of ignore I believe, for comparison, that NWB Inspector would have straight up skipped (just not have run) any checks on things below a specified 'severity' level |
In case you have not noticed yet, |
This commit implements BIDS validation of a direction through the deno-compiled BIDS validator. It provides the `bids_validate()` function which when called invoke the deno-compiled validator to validate a directory. The result of the validation will be returned as a list of `ValidationResult` objects.
`bids_validate()` calls the deno-compiled BIDS validator to do the actual validation
Annotating these attribute in `BIDSDatasetDescriptionAsset` with `defaultdict`more precisely describe their behavior
This retains the original logic to obtain the metadata
This allows BIDS schema version to be recorded in `ValidationResult` instances Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>
BIDS version is actually not always available since the BIDS validator currently doesn't provide info regarding BIDS version. See bids-standard/bids-validator#10 (comment) for details.
To prevent external calls to private members of `BIDSDatasetDescriptionAsset`
Of the `ValidationResult` objects returned from validation by the deno-compiled validator Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>
Not ignoring `dandiset.yaml` in BIDS validation, and adjust validation error message in the context of DANDI to direct the user to specify a `.bidsignore` file to exclude the `dandiset.yaml` file from BIDS validation properly
…nit__.py` This keeps implementation details private and control the definition of a public API as recommended by @CodyCBakerPhD
cde85ab to
e33159f
Compare
|
ok, let's proceed here. Note that I believe (from a trial run) we would still use bidsschematools to "extract" metadata which is a step to validate any asset against dandi schema, and only then we would use deno validator to do even more validation. I guess, let's proceed with this and improve upon if needed. Thank you @candleindark for the PR and @CodyCBakerPhD for the review. |
|
🚀 PR was released in |

This PR addresses #1597 partly. It doesn't replace the use of
bidsschematoolscompletely with the use of the deno-compiled BIDS validator.bidsschematoolsis still needed in obtaining BIDS related metadata for individual assets. However, BIDS validation results for a dataset is now obtained completely through the deno-compiled BIDS validator.Additionally, by modifying the validation error message regarding the
dandiset.yamlfile in BIDS validation, the PR includes changes to inform the user aboutdandiset.yamlin BIDS validation in the context of DANDI. This PR thus also closes #1602.Remaining TODOs:
bids_validate()func support the--ignoreNiftiHeadersoption in the underlying deno-compiled BIDS validatorbids_validate()func support the--recursiveoption in the underlying deno-compiled BIDS validatorstdout).bids_validate()func support the--configoption in the underlying deno-compiled BIDS validatorNote:
bids_validator_deno/_validator.py. Those todos denote options that may need to be adjusted.