Skip to content

Conversation

@adam2392
Copy link
Member

@adam2392 adam2392 commented Sep 20, 2021

Reference issue

Closes #9736
Closes #9774

What does this implement/fix?

This adds the public function mne.datasets.fetch_dataset and refactors the existing MNE datasets to use that function.

This also refactors _data_path function to be more generic.

Copy link
Member Author

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drammock in your opinion what is the best way to pass in processor for the existing MNE datasets?

Option 1: Stored in the config.py1 file. See https://github.com/mne-tools/mne-python/pull/9763/files#r712535501, and see mne/datasets/testing/_testing.py`

Option 2: Stored directly in each dataset's data_path() function. See https://github.com/mne-tools/mne-python/pull/9763/files#r712536490

@drammock
Copy link
Member

option 1 seems better; I like it being in the same place as the hash. With option 2 it's not clear how users would handle private datasets that also needed unzipping/untarring. However you'll need to figure out how to still make pooch an optional dependency since with option 1 it's used outside of a function.

@adam2392 adam2392 marked this pull request as ready for review September 21, 2021 02:14
@adam2392
Copy link
Member Author

option 1 seems better; I like it being in the same place as the hash. With option 2 it's not clear how users would handle private datasets that also needed unzipping/untarring. However you'll need to figure out how to still make pooch an optional dependency since with option 1 it's used outside of a function.

I played around with it, and there is no easy way to make it all option 1 because some of the datasets require some customized pathing anyways :/, so I went w/ option 2.

e.g.

    # instantiate processor that unzips file
    processor = pooch.Untar(
        extract_dir=path, members=[f'hf_sef/{subdir}' for subdir in
                                   ('MEG', 'SSS', 'subjects')]),

We still need this code inside data_path(), rather than config as a result. It seems actually less error-prone to store the extraction logic along w/ the type or processor in the same place. I don't see the zip/tar/nested-tar changing frequently where this would be an issue. The pattern is also pretty consistent for adding a new dataset.

Lmk wyt. I can also make a quick example to demonstrate mne.datasets.fetch_dataset if you think it's worthwhile, else I can just add more information to the doc string.

@adam2392 adam2392 changed the title [DRAFT] Fetching Datasets Public API [MRG] Fetching Datasets Public API Sep 21, 2021
@adam2392 adam2392 changed the title [MRG] Fetching Datasets Public API [MRG, MAINT] Fetching Datasets Public API "fetch_dataset" used by all internal MNE datasets now Sep 21, 2021
@adam2392
Copy link
Member Author

I guess I can actually just get rid of _data_path

@adam2392
Copy link
Member Author

Pair PR in: #9764

@adam2392
Copy link
Member Author

adam2392 commented Sep 21, 2021

CI Failure seems like its the recurring test in whitener issue.

Here is the generated docs: https://34581-1301584-gh.circle-artifacts.com/0/dev/generated/mne.datasets.fetch.fetch_dataset.html#mne.datasets.fetch.fetch_dataset

@adam2392 adam2392 requested a review from agramfort September 21, 2021 17:16
@agramfort
Copy link
Member

if we already have a clear usecase for fetch_dataset function then let's do it.

2 things:

  • I would expose it as mne.datasets.fetch_dataset
  • I would remove the _data_path that now seems superfluous.

thanks for the clarifications @adam2392

@drammock
Copy link
Member

drammock commented Sep 23, 2021 via email

Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See bugs reported in #9774 (comment)

Copy link
Member Author

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main changes are in:

  • mne/datasets/utils.py
  • mne/datasets/fetch.py

Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, will merge once CIs are happy to get CircleCI green again!

@drammock @agramfort feel free to comment as well but from what I can see @adam2392 has addressed your comments. If I missed something then feel free to comment and I'm sure @adam2392 will be happy to make another PR :)

@larsoner
Copy link
Member

I still don't understand the SphinxWindows failure, but this one on CircleCI seems legit (I can replicate locally):

$ python -c "import mne; mne.datasets.fieldtrip_cmc.data_path(verbose=True)"
Using default location ~/mne_data for fieldtrip_cmc...
Downloading file 'SubjectCMC.zip' from 'https://osf.io/j9b6s/download?version=1' to '/Users/larsoner/mne_data'.
100%|███████████████████████████████████████| 329M/329M [00:00<00:00, 1.76TB/s]
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "<decorator-gen-500>", line 22, in data_path
  File "/Users/larsoner/python/mne-python/mne/datasets/fieldtrip_cmc/fieldtrip_cmc.py", line 20, in data_path
    return _download_mne_dataset(
  File "/Users/larsoner/python/mne-python/mne/datasets/utils.py", line 175, in _download_mne_dataset
    return fetch_dataset(dataset_params=dataset_params, processor=processor_,
  File "/Users/larsoner/python/mne-python/mne/datasets/_fetch.py", line 260, in fetch_dataset
    fetcher.fetch(
  File "/Users/larsoner/opt/miniconda3/lib/python3.8/site-packages/pooch/core.py", line 549, in fetch
    return processor(str(full_path), action, self)
  File "/Users/larsoner/opt/miniconda3/lib/python3.8/site-packages/pooch/processors.py", line 86, in __call__
    self._extract_file(fname, self.extract_dir)
  File "/Users/larsoner/opt/miniconda3/lib/python3.8/site-packages/pooch/processors.py", line 196, in _extract_file
    with TarFile.open(fname, "r") as tar_file:
  File "/Users/larsoner/opt/miniconda3/lib/python3.8/tarfile.py", line 1604, in open
    raise ReadError("file could not be opened successfully")
tarfile.ReadError: file could not be opened successfully

@agramfort
Copy link
Member

+1 for MRG if Ci failure is unrelated

@adam2392
Copy link
Member Author

I still don't understand the SphinxWindows failure, but this one on CircleCI seems legit (I can replicate locally):

$ python -c "import mne; mne.datasets.fieldtrip_cmc.data_path(verbose=True)"
Using default location ~/mne_data for fieldtrip_cmc...
Downloading file 'SubjectCMC.zip' from 'https://osf.io/j9b6s/download?version=1' to '/Users/larsoner/mne_data'.
100%|███████████████████████████████████████| 329M/329M [00:00<00:00, 1.76TB/s]
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "<decorator-gen-500>", line 22, in data_path
  File "/Users/larsoner/python/mne-python/mne/datasets/fieldtrip_cmc/fieldtrip_cmc.py", line 20, in data_path
    return _download_mne_dataset(
  File "/Users/larsoner/python/mne-python/mne/datasets/utils.py", line 175, in _download_mne_dataset
    return fetch_dataset(dataset_params=dataset_params, processor=processor_,
  File "/Users/larsoner/python/mne-python/mne/datasets/_fetch.py", line 260, in fetch_dataset
    fetcher.fetch(
  File "/Users/larsoner/opt/miniconda3/lib/python3.8/site-packages/pooch/core.py", line 549, in fetch
    return processor(str(full_path), action, self)
  File "/Users/larsoner/opt/miniconda3/lib/python3.8/site-packages/pooch/processors.py", line 86, in __call__
    self._extract_file(fname, self.extract_dir)
  File "/Users/larsoner/opt/miniconda3/lib/python3.8/site-packages/pooch/processors.py", line 196, in _extract_file
    with TarFile.open(fname, "r") as tar_file:
  File "/Users/larsoner/opt/miniconda3/lib/python3.8/tarfile.py", line 1604, in open
    raise ReadError("file could not be opened successfully")
tarfile.ReadError: file could not be opened successfully

Ah yeah the file is a zip file, so we just needed to make it nested_unzip instead of nested_untar.

Original code I copied over was accidentally nested_untar (Ref: https://github.com/mne-tools/mne-python/pull/9742/files)

@adam2392
Copy link
Member Author

With the exception of the 3 windows Azure CI failures that I have no idea on, this is good to go by me.

I also tested it out w/ my downstream repo and the fetch_dataset works nicely for a private GITHUB repo.

lmk if you find other things I need to fix.

@larsoner
Copy link
Member

I don't understand the Windows Sphinx error -- I'll probably merge ignoring that assuming the empty [circle full] commit I just pushed succeeds. I can look into the Windows Sphinx error on Monday if it persists.

@larsoner
Copy link
Member

Next:

https://circleci.com/gh/mne-tools/mne-python/34733?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

Details
Downloading file 'sample_reference_MEG_noise-raw.zip' from 'https://osf.io/drt6v/download?version=1' to '/home/circleci/mne_data'.
100%|██████████████████████████████████████| 91.0M/91.0M [00:00<00:00, 102GB/s]
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "<decorator-gen-497>", line 22, in _download_all_example_data
  File "/home/circleci/project/mne/datasets/utils.py", line 264, in _download_all_example_data
    refmeg_noise.data_path()
  File "<decorator-gen-527>", line 24, in data_path
  File "/home/circleci/project/mne/datasets/refmeg_noise/refmeg_noise.py", line 17, in data_path
    return _download_mne_dataset(
  File "/home/circleci/project/mne/datasets/utils.py", line 177, in _download_mne_dataset
    return fetch_dataset(dataset_params=dataset_params, processor=processor_,
  File "/home/circleci/project/mne/datasets/_fetch.py", line 260, in fetch_dataset
    fetcher.fetch(
  File "/home/circleci/.local/lib/python3.8/site-packages/pooch/core.py", line 549, in fetch
    return processor(str(full_path), action, self)
  File "/home/circleci/.local/lib/python3.8/site-packages/pooch/processors.py", line 86, in __call__
    self._extract_file(fname, self.extract_dir)
  File "/home/circleci/.local/lib/python3.8/site-packages/pooch/processors.py", line 196, in _extract_file
    with TarFile.open(fname, "r") as tar_file:
  File "/usr/local/lib/python3.8/tarfile.py", line 1606, in open
    raise ReadError("file could not be opened successfully")
tarfile.ReadError: file could not be opened successfully

@adam2392 can you try to fix, and when you make your commit and push have [circle full] in the message?

@adam2392
Copy link
Member Author

So I think all the downloads work now, but this is a weird error:

 File "/home/circleci/project/tutorials/clinical/20_seeg.py", line 72, in <module>
    head_mri_t = mne.coreg.estimate_head_mri_t('sample_seeg', subjects_dir)
  File "<decorator-gen-74>", line 24, in estimate_head_mri_t
  File "/home/circleci/project/mne/_freesurfer.py", line 445, in estimate_head_mri_t
    lpa, nasion, rpa = get_mni_fiducials(subject, subjects_dir)
  File "<decorator-gen-73>", line 24, in get_mni_fiducials
  File "/home/circleci/project/mne/_freesurfer.py", line 419, in get_mni_fiducials
    mni_mri_t = invert_transform(read_talxfm(subject, subjects_dir))
  File "<decorator-gen-75>", line 24, in read_talxfm
  File "/home/circleci/project/mne/_freesurfer.py", line 485, in read_talxfm
    ras_mni_t = read_ras_mni_t(subject, subjects_dir)
  File "/home/circleci/project/mne/transforms.py", line 1466, in read_ras_mni_t
    fname = _check_fname(
  File "/home/circleci/project/mne/utils/check.py", line 180, in _check_fname
    raise FileNotFoundError(f'{name} does not exist: {fname}')
FileNotFoundError: FreeSurfer Talairach transformation file does not exist: /home/circleci/mne_data/MNE-sample-data/subjects/sample_seeg/mri/transforms/talairach.xfm

I see sample_seeg/ folder in MNE-misc-data/seeg/ though not in the sample-data. Unless the version is wrong??

Comment on lines -72 to +73
head_mri_t = mne.coreg.estimate_head_mri_t('sample_seeg', subjects_dir)
this_subject_dir = op.join(misc_path, 'seeg')
head_mri_t = mne.coreg.estimate_head_mri_t('sample_seeg', this_subject_dir)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@larsoner not sure how this ever passed?

I went to: 034ff3f

and tried it locally and it didn't work.

Copy link
Member Author

@adam2392 adam2392 Sep 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should fix though. Lmk if this is incorrect.

Reference: https://github.com/mne-tools/mne-python/pull/9763/files#r716075200

# update the checksum in `mne/data/dataset_checksums.txt` and change version
# here: ↓↓↓↓↓ ↓↓↓
RELEASES = dict(testing='0.123', misc='0.18')
RELEASES = dict(testing='0.123', misc='0.22')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The misc dataset was never updated since before pooch went in

releases = dict(testing='0.123', misc='0.19')

So I updated it here. This results in a few examples and tests breaking because it looks like sample_ecog.mat is replaced by sample_ecog_ieeg.fif. I point out where I updated in those files.

elif dataset == 'misc':
fname = op.join(misc.data_path(), 'ecog', 'sample_ecog.edf')
raw = read_raw_edf(fname)
fname = op.join(misc.data_path(), 'ecog', 'sample_ecog_ieeg.fif')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subjects_dir = op.join(sample_path, 'subjects')
misc_path = mne.datasets.misc.data_path()
ecog_data_fname = op.join(misc_path, 'ecog', 'sample_ecog.mat')
ecog_data_fname = op.join(misc_path, 'ecog', 'sample_ecog_ieeg.fif')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adam2392
Copy link
Member Author

So in summary the circle CI fixes are:

  • there were some datasets that were unzip instead of untar and vice versa that was never caught in the original pooch PR
  • some other PRs never caught the change in misc dataset, which renamed some data files, and thus broke a few tests and examples

This PR now addresses those and updated misc dataset to v0.22. Fingers crossed for this [circle full] commit.

@larsoner
Copy link
Member

Green, in it goes!

@larsoner larsoner merged commit 0b503d8 into mne-tools:main Sep 25, 2021
@rob-luke
Copy link
Member

Great stuff @adam2392 🎉 Im excited to add a few datasets to MNE-NIRS using this, thanks.

@adam2392 adam2392 deleted the fetchapi branch September 26, 2021 02:51
larsoner added a commit to larsoner/mne-python that referenced this pull request Oct 7, 2021
…d by all internal MNE datasets now (mne-tools#9763)"

This reverts commit 0b503d8.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Pooch fails to download on CircleCI Adding an optional token to the dataset fetcher code to allow optional fetching from private repositories

5 participants