Skip to content

Overhauling validation results to get them closer to cover different types of validators#1514

Merged
yarikoptic merged 6 commits intomasterfrom
ext-validation
Apr 4, 2025
Merged

Overhauling validation results to get them closer to cover different types of validators#1514
yarikoptic merged 6 commits intomasterfrom
ext-validation

Conversation

@yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Oct 18, 2024

Ref:

and result of our chat with @candleindark while also reviewing results of linkml validation over dandisets:

Notes/TODOs:

  • BREAKING CHANGE:
    • we replaced ValidationOrigin.bids_version to more generic Origin.standard + Origin.standard_version
    • We renamed ValidationOrigin.name to Origin.validator and ValidationOrigin.version to Origin.validator_version
    • Severity.HINT, Severity.WARNING, and Severity.ERROR now have value of 20, 30, and 40 respectively instead of 1, 2, and 3.
    • Introduce Origin.type to indicate the type of the origin of a validation result.
    • We made the ValidationResult model into a Pydantic model so that it is readily to serialization to JSON.
  • going through the code base and harmonizing construction of those ValidationResults
  • redo Validation results refactoring: centralize results for different reused origins, use _pydantic_errors_to_validation_results more #1176 in here

Please see individual commit messages for more details of the change.

Note:
This PR needs the changes in #1601 to pass the tests. Rebase/merge with master once #1601 is merged into master.

@codecov
Copy link

codecov bot commented Oct 18, 2024

Codecov Report

Attention: Patch coverage is 86.66667% with 14 lines in your changes missing coverage. Please review.

Project coverage is 88.66%. Comparing base (b1815da) to head (cf401cd).
Report is 110 commits behind head on master.

Files with missing lines Patch % Lines
dandi/files/zarr.py 27.27% 8 Missing ⚠️
dandi/pynwb_utils.py 60.00% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1514      +/-   ##
==========================================
+ Coverage   88.54%   88.66%   +0.11%     
==========================================
  Files          78       78              
  Lines       10872    10937      +65     
==========================================
+ Hits         9627     9697      +70     
+ Misses       1245     1240       -5     
Flag Coverage Δ
unittests 88.66% <86.66%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@candleindark
Copy link
Member

In updating the use of ValidationOrigin, it is not clear to me what should be provided for the standard and standard_version attributes of many ValidationOrigin objects. For example, this is not clear in ValidationOrigin objects in organized.py.

origin=ValidationOrigin(name="dandi", version=__version__),

Do we have a dandi standard? If we do, it doesn't seem that we version this standard.

I think we should indeed define an enum class for the supported standard, and also talk about how to retrieve the version of those standards.

@candleindark candleindark force-pushed the ext-validation branch 4 times, most recently from dcb456e to 2df818f Compare March 18, 2025 01:46
@candleindark
Copy link
Member

candleindark commented Mar 19, 2025

In updating the use of ValidationOrigin, it is not clear to me what should be provided for the standard and standard_version attributes of many ValidationOrigin objects. For example, this is not clear in ValidationOrigin objects in organized.py.

origin=ValidationOrigin(name="dandi", version=__version__),

Do we have a dandi standard? If we do, it doesn't seem that we version this standard.

I think we should indeed define an enum class for the supported standard, and also talk about how to retrieve the version of those standards.

We now, with the latest commit, have a subclass of Enum, Standard, to represent a list of standards. Some standard is informal, e.g., Standard.DANDI_LAYOUT and have no versioning. In the case, of a standard with no version, we will just let the standard_version attribute of Origin to default to None.

candleindark added a commit that referenced this pull request Mar 21, 2025
Per #1514 (comment)
this request should be postponed for now
@candleindark
Copy link
Member

candleindark commented Mar 21, 2025

@yarikoptic I think a ValidationResult object should represent the result of how well a "data" instance is validated. It should inform area/aspect of invalidity.

except Exception as e:
if devel_debug:
raise
lgr.warning(
"Unexpected validation error for %s: %s",
self.filepath,
e,
extra={"validating": True},
)
return [
ValidationResult(
origin=ValidationOrigin(
name="dandi",
version=dandi.__version__,
),
severity=Severity.ERROR,
id="dandi.SOFTWARE_ERROR",
scope=Scope.FILE,
# metadata=metadata,
path=self.filepath, # note that it is not relative .path
message=f"Failed to read metadata: {e}",
# TODO? dataset_path=dataset_path,
dandiset_path=self.dandiset_path,
)
]

The above snippet uses ValidationResult to represent an error other than a ValidationError. The captured error doesn't tell how valid a data instance is. It has nothing to do with the data being validated against but how our software is written. If that error is raised it is either a bug in the software, which should be propagated, or a user's mistake, which should be handled gracefully in another manner. The current arrangement is conflating error handling with the purpose of ValidationResult. Case in point, you can't provide the standard for the origin field in the above example.

Another misuse of ValidationResult is below.

try:
version = get_nwb_version(path, sanitize=False)
except Exception:
# we just will not remove any errors, it is required so should be some
pass
else:
if version is not None:
# Explicitly sanitize so we collect warnings.
nwb_errors: list[str] = []
version = _sanitize_nwb_version(version, log=nwb_errors.append)
for e in nwb_errors:
errors.append(
ValidationResult(
origin=ValidationOrigin(
name="pynwb",
version=pynwb.__version__,
),
severity=Severity.ERROR,
id="pynwb.GENERIC",
scope=Scope.FILE,
path=Path(path),
message=e,
)
)

The code snippet above also has other problems. For example, it suppresses errors. The problems should be fixed in an other dedicated PR though.

@candleindark candleindark force-pushed the ext-validation branch 2 times, most recently from 46eb9d3 to 06e8fe8 Compare March 24, 2025 04:40
@candleindark candleindark force-pushed the ext-validation branch 3 times, most recently from d27cad7 to a2cec1b Compare March 24, 2025 17:58
yarikoptic and others added 3 commits March 31, 2025 14:36
This commit include the following changes
to make `ValidationResult` a better
representation of the result of a
validation process. (In this message
"models" is used to refer to `ValidationResult`
and related models defined in
`validate_types.py`.
- Redefine the models as Pydantic models for
  easy serialization to JSON
- Introduce `Enum` types for used
  in fields in the models.
- Rename `ValidationOrigin` to `Origin`
- Replace `ValidationOrigin.bids_version` to
  more generic `Origin.standard` +
  `Origin.standard_version`
- Renamed `ValidationOrigin.name` to
  `Origin.validator` and `
  ValidationOrigin.version` to
  `Origin.validator_version`
- `Severity.HINT`, `Severity.WARNING`, and
  `Severity.ERROR` now have value of `20`, `30`,
   and `40` respectively instead of `1`, `2`, and
   `3`.
- Introduce `Origin.type` to indicate the type
  of the origin of a validation result
- Documentation in models are now expressed in
  triple-quoted strings
- Correction in use of the models where appropriate.
  I.e. corrections in settings of the fields of
  `Origin` model instances

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>
This is needed to support the syntax of `|` in
expressing union types in Python 3.9. This has been
needed long ago; however, the necessity only
becomes apparent after `|` is used in Pydantic
model definitions since Pydantic evaluate
those types at runtime.

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>
@candleindark
Copy link
Member

This PR needs the changes in #1601 to pass the tests. It will be merged with master once #1601 is merged into master.

Bring in changes that remove deprecated
uses of Pydantic
@candleindark
Copy link
Member

The Change from #1601 is merged. This PR is ready for a final review.

Copy link
Member Author

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few spots where to define local origin to be reused.

But also I wonder if generally we should pre-define all those origins so they get reused across all the paths. It smells a little as premature optimization to me though, so may be not worth it. But at least I think it is worth renaming them locally so they become a little more descriptive than just origin variables

@candleindark candleindark force-pushed the ext-validation branch 2 times, most recently from c09cab4 to 1836ea2 Compare April 3, 2025 20:28
The validations are done by the `dandi-cli` directly and
should be label so.
@candleindark
Copy link
Member

This one is ready for a final review.

@yarikoptic
Copy link
Member Author

FTR: The following look very problematic though not directly related to this PR.

except Exception:
# we just will not remove any errors, it is required so should be some
pass

I don't know its intent either.

yeah, better to be reanalyzed/improved. I think we burnt there a few times already

@yarikoptic yarikoptic mentioned this pull request Apr 4, 2025
2 tasks
https://zarr-specs.readthedocs.io/en/latest/specs.html, used in the Zarr data
object
"""
import zarr # Delay heavy import
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, apparently we have no single test on validation of zarrs? that's odd :-/

filed dedicated

for that

@yarikoptic
Copy link
Member Author

Thank you @candleindark , let's proceed with continuation in

@yarikoptic yarikoptic merged commit 88c3275 into master Apr 4, 2025
64 of 68 checks passed
@yarikoptic yarikoptic deleted the ext-validation branch April 4, 2025 12:56
@github-actions
Copy link

🚀 PR was released in 0.68.0 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

minor Increment the minor version when merged released

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants