Skip to content

Upload test to check for metadata#1180

Merged
yarikoptic merged 4 commits intomasterfrom
metadata
Dec 21, 2022
Merged

Upload test to check for metadata#1180
yarikoptic merged 4 commits intomasterfrom
metadata

Conversation

@TheChymera
Copy link
Contributor

Closes: dandi/dandisets#220

The idea is to add a test which ensures that the upload code produces BIDS assets for which the correct data can always be retrieved. Currently we just check a few of the files being uploaded. I think it already works in any case, but would be a good add-on test.

@yarikoptic @jwodder any ideas how I can access the metadata of the remote asset? I tried to look at other tests which check metadata, but thus far I haven't found anything that does that.

@codecov
Copy link

codecov bot commented Dec 20, 2022

Codecov Report

Base: 89.15% // Head: 89.07% // Decreases project coverage by -0.08% ⚠️

Coverage data is based on head (288715e) compared to base (8e67482).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1180      +/-   ##
==========================================
- Coverage   89.15%   89.07%   -0.09%     
==========================================
  Files          76       76              
  Lines        9510     9444      -66     
==========================================
- Hits         8479     8412      -67     
- Misses       1031     1032       +1     
Flag Coverage Δ
unittests 89.07% <100.00%> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
dandi/tests/test_upload.py 100.00% <100.00%> (ø)
dandi/support/threaded_walk.py 93.10% <0.00%> (-1.73%) ⬇️
dandi/dandiapi.py 87.11% <0.00%> (-1.17%) ⬇️
dandi/files/zarr.py 88.17% <0.00%> (-0.20%) ⬇️
dandi/download.py 88.02% <0.00%> (-0.03%) ⬇️
dandi/organize.py 83.54% <0.00%> (-0.01%) ⬇️
dandi/consts.py 100.00% <0.00%> (ø)
dandi/cli/cmd_organize.py 100.00% <0.00%> (ø)
dandi/tests/test_files.py 100.00% <0.00%> (ø)
... and 2 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@jwodder
Copy link
Contributor

jwodder commented Dec 20, 2022

@TheChymera dandiset.get_asset_by_path() returns a RemoteAsset object that has the methods get_raw_metadata() (returning a dict directly unserialized from JSON) and get_metadata() (returning a dandischema BaseAsset, if you're sure the metadata is valid)

@TheChymera
Copy link
Contributor Author

@jwodder thanks :)

So I can check for metadata and it's the correct metadata via get_raw_metadata(), but not with get_metadata().

for ddfe305:

PATH                                   SIZE    ERRORS UPLOAD STATUS MESSAGE
CHANGES
dataset_description.json
README
dandiset.yaml
sub-Sub1/perf/sub-Sub1_m0scan.nii.gz
sub-Sub1/perf/sub-Sub1_aslcontext.tsv
sub-Sub1/perf/sub-Sub1_m0scan.json
sub-Sub1/perf/sub-Sub1_asl.json
sub-Sub1/perf/sub-Sub1_asl.nii.gz
sub-Sub1/perf/sub-Sub1_asllabeling.jpg
sub-Sub1/anat/sub-Sub1_T1w.json
sub-Sub1/anat/sub-Sub1_T1w.nii.gz
Summary:                               0 Bytes
{'@context': 'https://raw.githubusercontent.com/dandi/schema/master/releases/0.6.3/context.json',
 'access': [{'schemaKey': 'AccessRequirements', 'status': 'dandi:OpenAccess'}],
 'blobDateModified': '2022-12-20T10:43:59.299141-05:00',
 'contentSize': 1,
 'contentUrl': ['http://localhost:8000/api/assets/5ae1d41e-ad10-4060-95dc-1cd727442f13/download/',
                'http://localhost:9000/dandi-dandisets/blobs/fa6/d5d/fa6d5da9-e65a-49f5-bcee-d80afab32294'],
 'dateModified': '2022-12-20T10:44:09.520165-05:00',
 'digest': {'dandi:dandi-etag': '741e0f4bad5b093044dc54a74d911094-1',
            'dandi:sha2-256': '01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b'},
 'encodingFormat': 'application/gzip',
 'id': 'dandiasset:5ae1d41e-ad10-4060-95dc-1cd727442f13',
 'identifier': '5ae1d41e-ad10-4060-95dc-1cd727442f13',
 'path': 'sub-Sub1/anat/sub-Sub1_T1w.nii.gz',
 'schemaKey': 'Asset',
 'schemaVersion': '0.6.3',
 'wasAttributedTo': [{'identifier': 'Sub1', 'schemaKey': 'Participant'}],
 'wasGeneratedBy': [{'description': 'Metadata generated by DANDI cli',
                     'id': 'urn:uuid:af9cf642-ec7d-44f5-96ca-661a93103669',
                     'name': 'Metadata generation',
                     'schemaKey': 'Activity',
                     'wasAssociatedWith': [{'identifier': 'RRID:SCR_019009',
                                            'name': 'DANDI Command Line '
                                                    'Interface',
                                            'schemaKey': 'Software',
                                            'url': 'https://github.com/dandi/dandi-cli',
                                            'version': '0.46.6+217.gddfe305'}]}]}
AAAAAAAAAAAAAAAAAAAAAA
FAILED

=================================== FAILURES ===================================
__________________________ test_upload_bids_metadata ___________________________
test_upload.py:231: in test_upload_bids_metadata
    ).get_metadata()
../dandiapi.py:1308: in get_metadata
    return models.Asset.parse_obj(self.get_raw_metadata())
pydantic/main.py:526: in pydantic.main.BaseModel.parse_obj
    ???
pydantic/main.py:342: in pydantic.main.BaseModel.__init__
    ???
E   pydantic.error_wrappers.ValidationError: 2 validation errors for Asset
E   contentUrl -> 0
E     URL host invalid, top level domain required (type=value_error.url.host)
E   contentUrl -> 1
E     URL host invalid, top level domain required (type=value_error.url.host)

Does that mean the schema is somehow malformed? The actual error look like it has something to do with an invalid URL, though 🤔

@jwodder
Copy link
Contributor

jwodder commented Dec 20, 2022

@TheChymera Where were you running the tests to see that output? The environment variable DANDI_ALLOW_LOCALHOST_URLS needs to be set in order for dandischema to accept localhost URLs; it's automatically set when using tox or the GitHub Actions CI.

@TheChymera
Copy link
Contributor Author

TheChymera commented Dec 20, 2022

oh, yes, that was indeed it, I was running pytest directly to point it at the test I was working on:

chymera@decohost ~/src/dandi-cli/dandi/tests $ DANDI_TESTS_PERSIST_DOCKER_COMPOSE=1 DANDI_TESTS_PULL_DOCKER_COMPOSE=0 pytest -vvs test_upload.py::test_upload_bids_metadata | less

@TheChymera TheChymera marked this pull request as draft December 20, 2022 16:51
# Automatically check all files, heuristic should remain very BIDS-stable
for asset in dandiset.get_assets(order="path"):
apath = asset.path
if "Sub1" in apath:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just test on some known paths since if we later rename from Sib1, this test would pass without testing anything

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about 288715e ? I like that we're checking all of them, at trivial additional time cost.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not runtime which concerned me but possible absent checking if that dataset changes so that eg there is no subjects -- this test would still be valid. ok let's consider it unlikely and proceed

@TheChymera TheChymera marked this pull request as ready for review December 21, 2022 19:34
def test_upload_bids_metadata(
mocker: MockerFixture, bids_dandiset: SampleDandiset
) -> None:
bids_dandiset.upload(existing="force")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why upload needs to be forced?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, you always do that... not sure if that is a good/clean practice, not liking it.. but not unique to this test, so let's proceed.

if "sub-" in apath:
metadata = dandiset.get_asset_by_path(apath).get_metadata()
# Hard-coded check for the subject identifier set in the fixture:
assert metadata.wasAttributedTo[0].identifier == "Sub1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a very rudimentary bids dataset, not even with sessions or other interesting assets,... oh well, ok, at least we check that subject is extracted. We would need to improve later

@yarikoptic yarikoptic merged commit 9aae3bf into master Dec 21, 2022
@yarikoptic yarikoptic deleted the metadata branch December 21, 2022 22:01
@github-actions
Copy link

🚀 PR was released in 0.48.1 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BIDS: ensure that we extract all metadata we can (subj/ses/ some entities)

3 participants