-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[MRG, MAINT] Fetching Datasets Public API "fetch_dataset" used by all internal MNE datasets now #9763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
adam2392
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drammock in your opinion what is the best way to pass in processor for the existing MNE datasets?
Option 1: Stored in the config.py1 file. See https://github.com/mne-tools/mne-python/pull/9763/files#r712535501, and see mne/datasets/testing/_testing.py`
Option 2: Stored directly in each dataset's data_path() function. See https://github.com/mne-tools/mne-python/pull/9763/files#r712536490
|
option 1 seems better; I like it being in the same place as the hash. With option 2 it's not clear how users would handle private datasets that also needed unzipping/untarring. However you'll need to figure out how to still make pooch an optional dependency since with option 1 it's used outside of a function. |
I played around with it, and there is no easy way to make it all option 1 because some of the datasets require some customized pathing anyways :/, so I went w/ option 2. e.g. We still need this code inside Lmk wyt. I can also make a quick example to demonstrate |
|
I guess I can actually just get rid of _data_path |
|
Pair PR in: #9764 |
|
CI Failure seems like its the recurring test in whitener issue. Here is the generated docs: https://34581-1301584-gh.circle-artifacts.com/0/dev/generated/mne.datasets.fetch.fetch_dataset.html#mne.datasets.fetch.fetch_dataset |
|
if we already have a clear usecase for fetch_dataset function then let's do it. 2 things:
thanks for the clarifications @adam2392 |
|
No strong opinion about that, so leave config.py as-is if you think that's cleaner.
Sent from ProtonMail mobile
…-------- Original Message --------
On Sep 22, 2021, 14:42, Adam Li wrote:
@adam2392 commented on this pull request.
---------------------------------------------------------------
In [mne/datasets/fetch.py](#9763 (comment)):
> + dataset_params : dict of dict
+ The dataset name and corresponding parameters to download the dataset.
+ The dataset parameters that contains the following keys:
+ ``archive_name``, ``url``, ``folder_name``, ``hash``,
+ ``config_key`` (optional). See Notes.
Makes sense to me.
Should I change the format in config.py, or handle internally the MNE datasets to go from dict of dict -> list of dict if needed when calling fetch_dataset?
I think the format in config.py is nice cuz it consolidates LOC per dataset.
—
You are receiving this because you were mentioned.
Reply to this email directly, [view it on GitHub](#9763 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/AAN2AU25CLZTZYOBTO2SUXDUDJES5ANCNFSM5EMIKLWQ).
|
larsoner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See bugs reported in #9774 (comment)
adam2392
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main changes are in:
- mne/datasets/utils.py
- mne/datasets/fetch.py
larsoner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, will merge once CIs are happy to get CircleCI green again!
@drammock @agramfort feel free to comment as well but from what I can see @adam2392 has addressed your comments. If I missed something then feel free to comment and I'm sure @adam2392 will be happy to make another PR :)
|
I still don't understand the SphinxWindows failure, but this one on CircleCI seems legit (I can replicate locally): |
|
+1 for MRG if Ci failure is unrelated |
Ah yeah the file is a zip file, so we just needed to make it Original code I copied over was accidentally nested_untar (Ref: https://github.com/mne-tools/mne-python/pull/9742/files) |
|
With the exception of the 3 windows Azure CI failures that I have no idea on, this is good to go by me. I also tested it out w/ my downstream repo and the fetch_dataset works nicely for a private GITHUB repo. lmk if you find other things I need to fix. |
|
I don't understand the Windows Sphinx error -- I'll probably merge ignoring that assuming the empty |
|
Next: Details@adam2392 can you try to fix, and when you make your commit and push have |
|
So I think all the downloads work now, but this is a weird error: I see |
| head_mri_t = mne.coreg.estimate_head_mri_t('sample_seeg', subjects_dir) | ||
| this_subject_dir = op.join(misc_path, 'seeg') | ||
| head_mri_t = mne.coreg.estimate_head_mri_t('sample_seeg', this_subject_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should fix though. Lmk if this is incorrect.
Reference: https://github.com/mne-tools/mne-python/pull/9763/files#r716075200
| # update the checksum in `mne/data/dataset_checksums.txt` and change version | ||
| # here: ↓↓↓↓↓ ↓↓↓ | ||
| RELEASES = dict(testing='0.123', misc='0.18') | ||
| RELEASES = dict(testing='0.123', misc='0.22') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The misc dataset was never updated since before pooch went in
mne-python/mne/datasets/utils.py
Line 257 in 034ff3f
| releases = dict(testing='0.123', misc='0.19') |
So I updated it here. This results in a few examples and tests breaking because it looks like sample_ecog.mat is replaced by sample_ecog_ieeg.fif. I point out where I updated in those files.
| elif dataset == 'misc': | ||
| fname = op.join(misc.data_path(), 'ecog', 'sample_ecog.edf') | ||
| raw = read_raw_edf(fname) | ||
| fname = op.join(misc.data_path(), 'ecog', 'sample_ecog_ieeg.fif') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| subjects_dir = op.join(sample_path, 'subjects') | ||
| misc_path = mne.datasets.misc.data_path() | ||
| ecog_data_fname = op.join(misc_path, 'ecog', 'sample_ecog.mat') | ||
| ecog_data_fname = op.join(misc_path, 'ecog', 'sample_ecog_ieeg.fif') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
So in summary the circle CI fixes are:
This PR now addresses those and updated |
|
Green, in it goes! |
|
Great stuff @adam2392 🎉 Im excited to add a few datasets to MNE-NIRS using this, thanks. |
…d by all internal MNE datasets now (mne-tools#9763)" This reverts commit 0b503d8.
Reference issue
Closes #9736
Closes #9774
What does this implement/fix?
This adds the public function
mne.datasets.fetch_datasetand refactors the existing MNE datasets to use that function.This also refactors
_data_pathfunction to be more generic.