[MRG, MAINT] Fetching Datasets Public API "fetch_dataset" used by all internal MNE datasets now #9763

adam2392 · 2021-09-20T15:50:05Z

Reference issue

Closes #9736
Closes #9774

What does this implement/fix?

This adds the public function mne.datasets.fetch_dataset and refactors the existing MNE datasets to use that function.

This also refactors _data_path function to be more generic.

mne/datasets/config.py

mne/datasets/ssvep/ssvep.py

adam2392

@drammock in your opinion what is the best way to pass in processor for the existing MNE datasets?

Option 1: Stored in the config.py1 file. See https://github.com/mne-tools/mne-python/pull/9763/files#r712535501, and see mne/datasets/testing/_testing.py`

Option 2: Stored directly in each dataset's data_path() function. See https://github.com/mne-tools/mne-python/pull/9763/files#r712536490

drammock · 2021-09-21T02:09:52Z

option 1 seems better; I like it being in the same place as the hash. With option 2 it's not clear how users would handle private datasets that also needed unzipping/untarring. However you'll need to figure out how to still make pooch an optional dependency since with option 1 it's used outside of a function.

adam2392 · 2021-09-21T02:17:25Z

option 1 seems better; I like it being in the same place as the hash. With option 2 it's not clear how users would handle private datasets that also needed unzipping/untarring. However you'll need to figure out how to still make pooch an optional dependency since with option 1 it's used outside of a function.

I played around with it, and there is no easy way to make it all option 1 because some of the datasets require some customized pathing anyways :/, so I went w/ option 2.

e.g.

    # instantiate processor that unzips file
    processor = pooch.Untar(
        extract_dir=path, members=[f'hf_sef/{subdir}' for subdir in
                                   ('MEG', 'SSS', 'subjects')]),

We still need this code inside data_path(), rather than config as a result. It seems actually less error-prone to store the extraction logic along w/ the type or processor in the same place. I don't see the zip/tar/nested-tar changing frequently where this would be an issue. The pattern is also pretty consistent for adding a new dataset.

Lmk wyt. I can also make a quick example to demonstrate mne.datasets.fetch_dataset if you think it's worthwhile, else I can just add more information to the doc string.

adam2392 · 2021-09-21T02:46:32Z

I guess I can actually just get rid of _data_path

mne/datasets/epilepsy_ecog/_data.py

adam2392 · 2021-09-21T14:07:36Z

Pair PR in: #9764

adam2392 · 2021-09-21T17:15:25Z

CI Failure seems like its the recurring test in whitener issue.

Here is the generated docs: https://34581-1301584-gh.circle-artifacts.com/0/dev/generated/mne.datasets.fetch.fetch_dataset.html#mne.datasets.fetch.fetch_dataset

mne/datasets/fetch.py

doc/datasets.rst

agramfort · 2021-09-22T06:24:49Z

if we already have a clear usecase for fetch_dataset function then let's do it.

2 things:

I would expose it as mne.datasets.fetch_dataset
I would remove the _data_path that now seems superfluous.

thanks for the clarifications @adam2392

mne/datasets/config.py

mne/datasets/fetch.py

drammock · 2021-09-23T01:23:49Z

No strong opinion about that, so leave config.py as-is if you think that's cleaner. Sent from ProtonMail mobile

…

-------- Original Message --------

On Sep 22, 2021, 14:42, Adam Li wrote: @adam2392 commented on this pull request. --------------------------------------------------------------- In [mne/datasets/fetch.py](#9763 (comment)): > + dataset_params : dict of dict + The dataset name and corresponding parameters to download the dataset. + The dataset parameters that contains the following keys: + ``archive_name``, ``url``, ``folder_name``, ``hash``, + ``config_key`` (optional). See Notes. Makes sense to me. Should I change the format in config.py, or handle internally the MNE datasets to go from dict of dict -> list of dict if needed when calling fetch_dataset? I think the format in config.py is nice cuz it consolidates LOC per dataset. — You are receiving this because you were mentioned. Reply to this email directly, [view it on GitHub](#9763 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/AAN2AU25CLZTZYOBTO2SUXDUDJES5ANCNFSM5EMIKLWQ).

ST-subjects.xls

SC-subjects.xls

larsoner

See bugs reported in #9774 (comment)

adam2392

The main changes are in:

mne/datasets/utils.py
mne/datasets/fetch.py

larsoner

LGTM, will merge once CIs are happy to get CircleCI green again!

@drammock @agramfort feel free to comment as well but from what I can see @adam2392 has addressed your comments. If I missed something then feel free to comment and I'm sure @adam2392 will be happy to make another PR :)

larsoner · 2021-09-24T03:23:10Z

I still don't understand the SphinxWindows failure, but this one on CircleCI seems legit (I can replicate locally):

$ python -c "import mne; mne.datasets.fieldtrip_cmc.data_path(verbose=True)"
Using default location ~/mne_data for fieldtrip_cmc...
Downloading file 'SubjectCMC.zip' from 'https://osf.io/j9b6s/download?version=1' to '/Users/larsoner/mne_data'.
100%|███████████████████████████████████████| 329M/329M [00:00<00:00, 1.76TB/s]
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "<decorator-gen-500>", line 22, in data_path
  File "/Users/larsoner/python/mne-python/mne/datasets/fieldtrip_cmc/fieldtrip_cmc.py", line 20, in data_path
    return _download_mne_dataset(
  File "/Users/larsoner/python/mne-python/mne/datasets/utils.py", line 175, in _download_mne_dataset
    return fetch_dataset(dataset_params=dataset_params, processor=processor_,
  File "/Users/larsoner/python/mne-python/mne/datasets/_fetch.py", line 260, in fetch_dataset
    fetcher.fetch(
  File "/Users/larsoner/opt/miniconda3/lib/python3.8/site-packages/pooch/core.py", line 549, in fetch
    return processor(str(full_path), action, self)
  File "/Users/larsoner/opt/miniconda3/lib/python3.8/site-packages/pooch/processors.py", line 86, in __call__
    self._extract_file(fname, self.extract_dir)
  File "/Users/larsoner/opt/miniconda3/lib/python3.8/site-packages/pooch/processors.py", line 196, in _extract_file
    with TarFile.open(fname, "r") as tar_file:
  File "/Users/larsoner/opt/miniconda3/lib/python3.8/tarfile.py", line 1604, in open
    raise ReadError("file could not be opened successfully")
tarfile.ReadError: file could not be opened successfully

agramfort · 2021-09-24T10:33:55Z

+1 for MRG if Ci failure is unrelated

adam2392 · 2021-09-24T14:11:59Z

I still don't understand the SphinxWindows failure, but this one on CircleCI seems legit (I can replicate locally):

$ python -c "import mne; mne.datasets.fieldtrip_cmc.data_path(verbose=True)"
Using default location ~/mne_data for fieldtrip_cmc...
Downloading file 'SubjectCMC.zip' from 'https://osf.io/j9b6s/download?version=1' to '/Users/larsoner/mne_data'.
100%|███████████████████████████████████████| 329M/329M [00:00<00:00, 1.76TB/s]
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "<decorator-gen-500>", line 22, in data_path
  File "/Users/larsoner/python/mne-python/mne/datasets/fieldtrip_cmc/fieldtrip_cmc.py", line 20, in data_path
    return _download_mne_dataset(
  File "/Users/larsoner/python/mne-python/mne/datasets/utils.py", line 175, in _download_mne_dataset
    return fetch_dataset(dataset_params=dataset_params, processor=processor_,
  File "/Users/larsoner/python/mne-python/mne/datasets/_fetch.py", line 260, in fetch_dataset
    fetcher.fetch(
  File "/Users/larsoner/opt/miniconda3/lib/python3.8/site-packages/pooch/core.py", line 549, in fetch
    return processor(str(full_path), action, self)
  File "/Users/larsoner/opt/miniconda3/lib/python3.8/site-packages/pooch/processors.py", line 86, in __call__
    self._extract_file(fname, self.extract_dir)
  File "/Users/larsoner/opt/miniconda3/lib/python3.8/site-packages/pooch/processors.py", line 196, in _extract_file
    with TarFile.open(fname, "r") as tar_file:
  File "/Users/larsoner/opt/miniconda3/lib/python3.8/tarfile.py", line 1604, in open
    raise ReadError("file could not be opened successfully")
tarfile.ReadError: file could not be opened successfully

Ah yeah the file is a zip file, so we just needed to make it nested_unzip instead of nested_untar.

Original code I copied over was accidentally nested_untar (Ref: https://github.com/mne-tools/mne-python/pull/9742/files)

adam2392 · 2021-09-24T15:17:35Z

With the exception of the 3 windows Azure CI failures that I have no idea on, this is good to go by me.

I also tested it out w/ my downstream repo and the fetch_dataset works nicely for a private GITHUB repo.

lmk if you find other things I need to fix.

larsoner · 2021-09-24T22:32:04Z

I don't understand the Windows Sphinx error -- I'll probably merge ignoring that assuming the empty [circle full] commit I just pushed succeeds. I can look into the Windows Sphinx error on Monday if it persists.

larsoner · 2021-09-24T22:53:48Z

Details

Downloading file 'sample_reference_MEG_noise-raw.zip' from 'https://osf.io/drt6v/download?version=1' to '/home/circleci/mne_data'.
100%|██████████████████████████████████████| 91.0M/91.0M [00:00<00:00, 102GB/s]
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "<decorator-gen-497>", line 22, in _download_all_example_data
  File "/home/circleci/project/mne/datasets/utils.py", line 264, in _download_all_example_data
    refmeg_noise.data_path()
  File "<decorator-gen-527>", line 24, in data_path
  File "/home/circleci/project/mne/datasets/refmeg_noise/refmeg_noise.py", line 17, in data_path
    return _download_mne_dataset(
  File "/home/circleci/project/mne/datasets/utils.py", line 177, in _download_mne_dataset
    return fetch_dataset(dataset_params=dataset_params, processor=processor_,
  File "/home/circleci/project/mne/datasets/_fetch.py", line 260, in fetch_dataset
    fetcher.fetch(
  File "/home/circleci/.local/lib/python3.8/site-packages/pooch/core.py", line 549, in fetch
    return processor(str(full_path), action, self)
  File "/home/circleci/.local/lib/python3.8/site-packages/pooch/processors.py", line 86, in __call__
    self._extract_file(fname, self.extract_dir)
  File "/home/circleci/.local/lib/python3.8/site-packages/pooch/processors.py", line 196, in _extract_file
    with TarFile.open(fname, "r") as tar_file:
  File "/usr/local/lib/python3.8/tarfile.py", line 1606, in open
    raise ReadError("file could not be opened successfully")
tarfile.ReadError: file could not be opened successfully

@adam2392 can you try to fix, and when you make your commit and push have [circle full] in the message?

adam2392 · 2021-09-25T14:04:45Z

So I think all the downloads work now, but this is a weird error:

 File "/home/circleci/project/tutorials/clinical/20_seeg.py", line 72, in <module>
    head_mri_t = mne.coreg.estimate_head_mri_t('sample_seeg', subjects_dir)
  File "<decorator-gen-74>", line 24, in estimate_head_mri_t
  File "/home/circleci/project/mne/_freesurfer.py", line 445, in estimate_head_mri_t
    lpa, nasion, rpa = get_mni_fiducials(subject, subjects_dir)
  File "<decorator-gen-73>", line 24, in get_mni_fiducials
  File "/home/circleci/project/mne/_freesurfer.py", line 419, in get_mni_fiducials
    mni_mri_t = invert_transform(read_talxfm(subject, subjects_dir))
  File "<decorator-gen-75>", line 24, in read_talxfm
  File "/home/circleci/project/mne/_freesurfer.py", line 485, in read_talxfm
    ras_mni_t = read_ras_mni_t(subject, subjects_dir)
  File "/home/circleci/project/mne/transforms.py", line 1466, in read_ras_mni_t
    fname = _check_fname(
  File "/home/circleci/project/mne/utils/check.py", line 180, in _check_fname
    raise FileNotFoundError(f'{name} does not exist: {fname}')
FileNotFoundError: FreeSurfer Talairach transformation file does not exist: /home/circleci/mne_data/MNE-sample-data/subjects/sample_seeg/mri/transforms/talairach.xfm

I see sample_seeg/ folder in MNE-misc-data/seeg/ though not in the sample-data. Unless the version is wrong??

adam2392 · 2021-09-25T14:21:31Z

tutorials/clinical/20_seeg.py

-head_mri_t = mne.coreg.estimate_head_mri_t('sample_seeg', subjects_dir)
+this_subject_dir = op.join(misc_path, 'seeg')
+head_mri_t = mne.coreg.estimate_head_mri_t('sample_seeg', this_subject_dir)


@larsoner not sure how this ever passed?

I went to: 034ff3f

and tried it locally and it didn't work.

This should fix though. Lmk if this is incorrect.

Reference: https://github.com/mne-tools/mne-python/pull/9763/files#r716075200

adam2392 · 2021-09-25T17:35:20Z

mne/datasets/config.py

 # update the checksum in `mne/data/dataset_checksums.txt` and change version
 # here:                  ↓↓↓↓↓         ↓↓↓
-RELEASES = dict(testing='0.123', misc='0.18')
+RELEASES = dict(testing='0.123', misc='0.22')


The misc dataset was never updated since before pooch went in

mne-python/mne/datasets/utils.py

Line 257 in 034ff3f

releases = dict(testing='0.123', misc='0.19')

So I updated it here. This results in a few examples and tests breaking because it looks like sample_ecog.mat is replaced by sample_ecog_ieeg.fif. I point out where I updated in those files.

adam2392 · 2021-09-25T17:35:50Z

mne/export/tests/test_export.py

    elif dataset == 'misc':
-        fname = op.join(misc.data_path(), 'ecog', 'sample_ecog.edf')
-        raw = read_raw_edf(fname)
+        fname = op.join(misc.data_path(), 'ecog', 'sample_ecog_ieeg.fif')


Reference: https://github.com/mne-tools/mne-python/pull/9763/files#r716075200

adam2392 · 2021-09-25T17:36:11Z

examples/visualization/3d_to_2d.py

 subjects_dir = op.join(sample_path, 'subjects')
 misc_path = mne.datasets.misc.data_path()
-ecog_data_fname = op.join(misc_path, 'ecog', 'sample_ecog.mat')
+ecog_data_fname = op.join(misc_path, 'ecog', 'sample_ecog_ieeg.fif')


Reference: https://github.com/mne-tools/mne-python/pull/9763/files#r716075200

adam2392 · 2021-09-25T17:38:55Z

So in summary the circle CI fixes are:

there were some datasets that were unzip instead of untar and vice versa that was never caught in the original pooch PR
some other PRs never caught the change in misc dataset, which renamed some data files, and thus broke a few tests and examples

This PR now addresses those and updated misc dataset to v0.22. Fingers crossed for this [circle full] commit.

larsoner · 2021-09-25T23:40:37Z

Green, in it goes!

rob-luke · 2021-09-26T00:09:56Z

Great stuff @adam2392 🎉 Im excited to add a few datasets to MNE-NIRS using this, thanks.

…d by all internal MNE datasets now (mne-tools#9763)" This reverts commit 0b503d8.

adam2392 marked this pull request as draft September 20, 2021 15:50

adam2392 mentioned this pull request Sep 20, 2021

[MRG] Move files in config to a python file #9762

Merged

adam2392 force-pushed the fetchapi branch from 436c624 to 1de9199 Compare September 20, 2021 21:33

adam2392 commented Sep 20, 2021

View reviewed changes

mne/datasets/config.py Outdated Show resolved Hide resolved

adam2392 commented Sep 20, 2021

View reviewed changes

mne/datasets/ssvep/ssvep.py Outdated Show resolved Hide resolved

adam2392 commented Sep 20, 2021

View reviewed changes

rob-luke mentioned this pull request Sep 20, 2021

Dataset downloading is broken until MNE-python is fixed mne-tools/mne-nirs#386

Closed

drammock mentioned this pull request Sep 20, 2021

hackish workaround for changes to downloading mne-tools/mne-nirs#388

Merged

adam2392 force-pushed the fetchapi branch from 3e94e5b to 4ebd1d9 Compare September 21, 2021 00:56

adam2392 marked this pull request as ready for review September 21, 2021 02:14

adam2392 changed the title ~~[DRAFT] Fetching Datasets Public API~~ [MRG] Fetching Datasets Public API Sep 21, 2021

adam2392 changed the title ~~[MRG] Fetching Datasets Public API~~ [MRG, MAINT] Fetching Datasets Public API "fetch_dataset" used by all internal MNE datasets now Sep 21, 2021

agramfort reviewed Sep 21, 2021

View reviewed changes

mne/datasets/epilepsy_ecog/_data.py Outdated Show resolved Hide resolved

adam2392 requested a review from agramfort September 21, 2021 17:16

agramfort reviewed Sep 21, 2021

View reviewed changes

mne/datasets/fetch.py Outdated Show resolved Hide resolved

doc/datasets.rst Outdated Show resolved Hide resolved

drammock reviewed Sep 22, 2021

View reviewed changes

mne/datasets/config.py Outdated Show resolved Hide resolved

mne/datasets/fetch.py Outdated Show resolved Hide resolved

mne/datasets/fetch.py Outdated Show resolved Hide resolved

mne/datasets/fetch.py Outdated Show resolved Hide resolved

adam2392 mentioned this pull request Sep 22, 2021

BUG: Pooch fails to download on CircleCI #9774

Closed

larsoner reviewed Sep 23, 2021

View reviewed changes

ST-subjects.xls Outdated Show resolved Hide resolved

SC-subjects.xls Outdated Show resolved Hide resolved

larsoner requested changes Sep 23, 2021

View reviewed changes

adam2392 commented Sep 23, 2021

View reviewed changes

larsoner approved these changes Sep 23, 2021

View reviewed changes

adam2392 mentioned this pull request Sep 23, 2021

has_dataset should rely on dataset_params in mne.datasets.utils #9776

Closed

Draft

753e6a7

larsoner added 2 commits September 23, 2021 16:41

FIX: Path

1413699

FIX: Try [circle full]

dc16927

FIX: Missed

23939f0

Fix fieldtrip cmc

2af4fbe

Try full [circle full]

74180ea

adam2392 added 3 commits September 24, 2021 19:08

Try again [circle full]

bab479b

Merge branch 'fetchapi' of github.com:adam2392/mne-python into fetchapi

53c80dd

try full [circle full]

adf321d

Try again [circle full]

b7f3fbb

adam2392 commented Sep 25, 2021

View reviewed changes

adam2392 added 2 commits September 25, 2021 10:42

Update the misc button [circle full]

d3419d0

Fix ci [circle full]

066b6dc

adam2392 requested a review from sappelhoff as a code owner September 25, 2021 17:33

adam2392 commented Sep 25, 2021

View reviewed changes

larsoner merged commit 0b503d8 into mne-tools:main Sep 25, 2021

adam2392 deleted the fetchapi branch September 26, 2021 02:51

rob-luke mentioned this pull request Sep 28, 2021

Use MNE-Python fetch_dataset function mne-tools/mne-nirs#394

Merged

larsoner added a commit to larsoner/mne-python that referenced this pull request Oct 7, 2021

Revert "[MRG, MAINT] Fetching Datasets Public API "fetch_dataset" use…

1b893a9

…d by all internal MNE datasets now (mne-tools#9763)" This reverts commit 0b503d8.

Uh oh!

[MRG, MAINT] Fetching Datasets Public API "fetch_dataset" used by all internal MNE datasets now #9763

[MRG, MAINT] Fetching Datasets Public API "fetch_dataset" used by all internal MNE datasets now #9763

Uh oh!

Conversation

adam2392 commented Sep 20, 2021 • edited by larsoner Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference issue

What does this implement/fix?

Uh oh!

Uh oh!

Uh oh!

adam2392 left a comment

Choose a reason for hiding this comment

Uh oh!

drammock commented Sep 21, 2021

Uh oh!

adam2392 commented Sep 21, 2021

Uh oh!

adam2392 commented Sep 21, 2021

Uh oh!

Uh oh!

adam2392 commented Sep 21, 2021

Uh oh!

adam2392 commented Sep 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

agramfort commented Sep 22, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

drammock commented Sep 23, 2021 via email

Uh oh!

Uh oh!

Uh oh!

larsoner left a comment

Choose a reason for hiding this comment

Uh oh!

adam2392 left a comment

Choose a reason for hiding this comment

Uh oh!

larsoner left a comment

Choose a reason for hiding this comment

Uh oh!

larsoner commented Sep 24, 2021

Uh oh!

agramfort commented Sep 24, 2021

Uh oh!

adam2392 commented Sep 24, 2021

Uh oh!

adam2392 commented Sep 24, 2021

Uh oh!

larsoner commented Sep 24, 2021

Uh oh!

larsoner commented Sep 24, 2021

Uh oh!

adam2392 commented Sep 25, 2021

Uh oh!

adam2392 Sep 25, 2021

Choose a reason for hiding this comment

Uh oh!

adam2392 Sep 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adam2392 Sep 25, 2021

Choose a reason for hiding this comment

Uh oh!

adam2392 Sep 25, 2021

Choose a reason for hiding this comment

Uh oh!

adam2392 Sep 25, 2021

Choose a reason for hiding this comment

Uh oh!

adam2392 commented Sep 25, 2021

Uh oh!

adam2392 commented Sep 20, 2021 •

edited by larsoner

Loading

adam2392 commented Sep 21, 2021 •

edited

Loading

adam2392 Sep 25, 2021 •

edited

Loading