[MRG] Move files in config to a python file #9762

adam2392 · 2021-09-20T02:55:26Z

Reference issue

First part of #9736

What does this implement/fix?

Moves dataset_checksums file into a dictionary of dictionaries that define each dataset uniquely.

Additional information

The next phase would be to:

create a public facing function: mne.datasets.fetch_dataset(), which takes in a dictionary of parameters that must have archive_name, folder_name, url, hash defined and optionally can pass in config_key and token
refactor _data_path to accept these parameters instead
refactor functions that call _data_path internally to leverage MNE_DATASETS to pass in the necessary parameters to _data_path.

drammock

looking good. Mostly I'm suggesting clarifying the code comments, but one important improvement to make is to avoid writing the temporary file altogether.

mne/datasets/config.py

mne/datasets/utils.py

Co-authored-by: Daniel McCloy <dan@mccloy.info>

adam2392 · 2021-09-20T18:51:17Z

Kay I think this is good to go now @drammock

FYI: #9763 will have the downstream PR that starts to look at generalizing the dataset fetcher. Trying things out, I realized we also need to pull the processor for each dataset out. E.g. the Unzip/UnTar/etc.

drammock · 2021-09-20T19:07:06Z

the downstream PR that starts to look at generalizing the dataset fetcher. Trying things out, I realized we also need to pull the processor for each dataset out. E.g. the Unzip/UnTar/etc.

Most of the datasets work with the default values of the pooch processors, so for the generic function I suggest:

processor : None | "zip" | "tar" | instance of pooch.Unzip | instance of pooch.Untar

where None is for files that aren't compressed at all, and the string arguments will use our internally defined pooch.Unzip or Untar similar to how we do it in datasets/utils.py line ~280

For this PR, it would probably be good to pull out the processors into the individual dicts that define each dataset. I can push a commit to do that if you're not sure how best to do it.

adam2392 · 2021-09-20T19:22:19Z

Most of the datasets work with the default values of the pooch processors, so for the generic function I suggest:
processor : None | "zip" | "tar" | instance of pooch.Unzip | instance of pooch.Untar
where None is for files that aren't compressed at all, and the string arguments will use our internally defined pooch.Unzip or Untar similar to how we do it in datasets/utils.py line ~280

Sounds good. That looks good to me.

For this PR, it would probably be good to pull out the processors into the individual dicts that define each dataset. I can push a commit to do that if you're not sure how best to do it.

How would we do this actually since the extract_dir=path relies on the path variable defined internally?

drammock · 2021-09-20T19:43:37Z

How would we do this actually since the extract_dir=path relies on the path variable defined internally?

Oops, you're right. I was thinking that when the path variable is passed in as a param to _data_path() if it's already a string (not None) it is left untouched, see the logic in the _get_path() helper function. But we're not doing that --- I forgot we're building the path from the config_key for the internal datasets. So never mind that suggestion.

adam2392 · 2021-09-20T20:05:51Z

I think we need to then separate out _get_path to within fetch_dataset before calling _data_path. That way, the path must be explicitly defined before calling _data_path. Then the processor can be defined outside _data_path and already have the path it needs.

Probably best to save that for the next PR since that depends on creation of the generic public API function?

If so, this is good to go by me.

adam2392 · 2021-09-20T21:31:21Z

Looks like the azure pipeline failures are timing out. I don't think due to this PR at least?

drammock · 2021-09-20T21:32:16Z

Looks like the azure pipeline failures are timing out. I don't think due to this PR at least?

that's been happening for several days, I haven't had a chance to look into it yet

agramfort · 2021-09-21T07:51:12Z

mne/datasets/config.py

+#                of the downloaded dataset (ex: "MNE_DATASETS_EEGBCI_PATH").
+# - (optional) token : An authorization token, if one is needed to access the
+#                      download URL.
+fnirs_motor = dict(


@adam2392 sorry I was slow to review here. I would suggest you write it as:

MNE_DATASETS = dict() MNE_DATASETS['fnirs_motor'] = dict(

this way you don't need to replicate below the list of datasets so you won't forget one.

can you open a new PR?

adam2392 changed the title ~~[WIP] Move files into config to~~ [WIP] Move files into config to a python file Sep 20, 2021

adam2392 added 4 commits September 20, 2021 09:16

Refactor

fcc0f96

Refactor

476f8aa

Fix bugs

c601669

Try again

6682e2d

adam2392 force-pushed the dataset branch from fb2a8e9 to 6682e2d Compare September 20, 2021 13:16

adam2392 added 2 commits September 20, 2021 09:49

Fix unit tests

944d763

Fixed unit tests and order

fbfe37b

adam2392 changed the title ~~[WIP] Move files into config to a python file~~ [MRG] Move files into config to a python file Sep 20, 2021

drammock requested changes Sep 20, 2021

View reviewed changes

adam2392 changed the title ~~[MRG] Move files into config to a python file~~ [MRG] Move files in config to a python file Sep 20, 2021

adam2392 and others added 3 commits September 20, 2021 13:45

Apply suggestions from code review

3655bf6

Co-authored-by: Daniel McCloy <dan@mccloy.info>

Adding updated code

38e454b

Fix flake

545c1e5

drammock approved these changes Sep 20, 2021

View reviewed changes

drammock merged commit 9627d09 into mne-tools:main Sep 20, 2021

adam2392 deleted the dataset branch September 21, 2021 00:05

agramfort reviewed Sep 21, 2021

View reviewed changes

adam2392 mentioned this pull request Sep 21, 2021

[MRG] Fix upstream config.py issue in datasets #9764

Merged

Uh oh!

[MRG] Move files in config to a python file #9762

[MRG] Move files in config to a python file #9762

Uh oh!

Conversation

adam2392 commented Sep 20, 2021

Reference issue

What does this implement/fix?

Additional information

Uh oh!

drammock left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adam2392 commented Sep 20, 2021

Uh oh!

drammock commented Sep 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adam2392 commented Sep 20, 2021

Uh oh!

drammock commented Sep 20, 2021

Uh oh!

adam2392 commented Sep 20, 2021

Uh oh!

adam2392 commented Sep 20, 2021

Uh oh!

drammock commented Sep 20, 2021

Uh oh!

agramfort Sep 21, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

drammock commented Sep 20, 2021 •

edited

Loading