Skip to content

Conversation

@mscheltienne
Copy link
Member

@mscheltienne mscheltienne commented Jan 16, 2022

Closes #10022.

Outdated bullet points:

  • Harmonize keyword arguments among all datasets: force_update, update_path, download and verbose are now keyword-only everywhere and not only on a subset.
  • Remove tqdm dependency for progress bars.
  • Suppress progress bars if verbose is set to False.
  • Remove pooch strict dependency if datasets are already downloaded.

Keyword only arguments

If the proposal doesn't work for you, I can revert and at least set verbose as keyword-only everywhere.


tqdm

I've set tqdm as a soft dependency. If it is not installed, progress bars are disabled. I think this is simpler than checking for the verbosity level and defining a limit for the progress bars.


pooch

This one is a bit tricky, I will do it later.
The dependency is checked both in fetch_dataset and in _download_mne_dataset.

https://github.com/mscheltienne/mne-python/blob/37356f26cb765d74dc0b5fcf46c78bbbcbf22854/mne/datasets/_fetch.py#L130-L131

https://github.com/mscheltienne/mne-python/blob/37356f26cb765d74dc0b5fcf46c78bbbcbf22854/mne/datasets/utils.py#L151-L152

@mscheltienne mscheltienne marked this pull request as draft January 16, 2022 12:15
@larsoner
Copy link
Member

Have you seen #10199 and the issue linked there?

@mscheltienne
Copy link
Member Author

mscheltienne commented Jan 16, 2022

I did see the issue earlier, but not the PR! I didn't think you would change it so quickly.
If both pooch and tqdm become hard dependencies for mne, then #10022 is outdated.

@larsoner
Copy link
Member

Maybe we should make both a soft dependency, not sure.

I think this is simpler than checking for the verbosity level and defining a limit for the progress bars.

Triaging based on level whether or not to show a progress bar was intentional by design. To me if you're in verbose=True (info) mode you should see a bar and if warn or above, you shouldn't. This gives people direct call-by-call level control, rather than just deciding based on whether or not a package (pooch) is present.

@cbrnr
Copy link
Contributor

cbrnr commented Jan 17, 2022

I think we need to decide which packages are core dependencies and which are optional dependencies. Fetching data sets should be considered a core functionality, and therefore I'd keep both tqdm (which doesn't have any dependencies itself other than colorama on Windows) and pooch (which does require some packages itself, most notably requests) as required packages.

I have also been thinking about our recurrent discussion on packages that are not pure Python. I don't think this should be a criterium for us to include a package as a dependency or not (otherwise we need to drop NumPy, SciPy, and Matplotlib). It is much more important if a package is actively maintained and provides a broad variety of binary wheels on PyPI.

@mscheltienne
Copy link
Member Author

Triaging based on level whether or not to show a progress bar was intentional by design. To me if you're in verbose=True (info) mode you should see a bar and if warn or above, you shouldn't.

I was going to add it too; but I started with checking if tqdm was installed or not, this way dataset fetching will at least work regardless of tqdm.

I'll wait for the discussion/PRs on core dependencies to be merged before I continue this PR.

@larsoner
Copy link
Member

Suppress progress bars if verbose is set to False.

I don't think you need to make any changes for this to be the case, it already works this way for me on main -- is it different for you? If so then we're doing something wrong there...

Remove pooch strict dependency if datasets are already downloaded.

Since I'm going to be messing with this in #10199 a bit, I might get to this there. I think your * changes in this PR might still be worthwhile, though, since I think it will help users not do something bad even if it's technically a backward compat break...

@mscheltienne
Copy link
Member Author

mscheltienne commented Jan 17, 2022

I don't think it works as intended. After more testing using main and the sample dataset:

  1. If tqdm is not installed, you can not download a dataset, regardless of verbose. The download should work regardless of tqdm.
  2. mri\brain-neuromag\sets\COR.fif has a problem and is downloaded/untar in a protected mode on my windows PC. I can not redownload and untar the dataset because a PermissionError is raised.

image

  1. After deleting the dataset and using sample.data_path(verbose=False), the path is returned, BUT the dataset is not downloaded. But indeed, there is no progress bar displayed.
  2. If both force_update and download are set to True with verbose=False, some funny thing happens. I had both times where I had the progress bar, and others when I didn't (legit, just using up-arrow to re-run the same command in the python terminal..)

To be confirm on macOS and Linux..

@mscheltienne
Copy link
Member Author

For the * change, I know it's backward-incompatible, it's just a proposal. IMO, this change reduces the risk of users making mistakes and is worth the backward-incompatibility. Open for debate 😄

@cbrnr
Copy link
Contributor

cbrnr commented Jan 17, 2022

For the * change, I know it's backward-incompatible, it's just a proposal. IMO, this change reduces the risk of users making mistakes and is worth the backward-incompatibility. Open for debate 😄

Here we go! I'm not a big fan of restricting what users are allowed to do in many situations. IMO the only advantage of kw-only args is that devs can freely shift around the order of parameters.

Regarding tqdm, I'd just make it a hard dependency. It's a small Python-only package so why bother trying to download a file without a progress bar?

@mscheltienne
Copy link
Member Author

mscheltienne commented Jan 17, 2022

Regarding tqdm, I'd just make it a hard dependency. It's a small Python-only package so why bother trying to download a file without a progress bar?

Yes, if it becomes a core dependency, this problem is solved. For any related change, I will wait for #10199 and eventual follow-ups (and I will probably revert the soft-dependency change I already added here).

@larsoner
Copy link
Member

mri\brain-neuromag\sets\COR.fif has a problem and is downloaded/untar in a protected mode on my windows PC. I can not redownload and untar the dataset because a PermissionError is raised.

We should maybe see if our code can force-remove even write-protected files. If it can't, we could update the dataset to have chmod a+w or so on the files... but it would be nice not to have to do this. Can you see if there is some more aggressive way to delete files on Windows? For me it works on macOS despite the strict permissions:

$ ls -al ~/mne_data/MNE-sample-data/subjects/sample/mri/brain-neuromag/sets/COR.fif 
-r--r--r--  1 larsoner  staff  16903824 Nov 18  2013 /Users/larsoner/mne_data/MNE-sample-data/subjects/sample/mri/brain-neuromag/sets/COR.fif
(base) bunker:mne-python larsoner$ python -c "import mne; mne.datasets.sample.data_path(force_update=True, verbose=True)"
Downloading file 'MNE-sample-data-processed.tar.gz' from 'https://osf.io/86qa2/download?version=5' to '/Users/larsoner/mne_data'.
  2%|▋                                    | 29.6M/1.65G [00:04<13:22, 2.02MB/s]

If both force_update and download are set to True with verbose=False, some funny thing happens. I had both times where I had the progress bar, and others when I didn't

Do you mean like this as a "did not have progress bar" (second call below)?

$ python -c "import mne; print(mne.datasets.misc.data_path(verbose=True))"
/Users/larsoner/mne_data/MNE-misc-data
$ python -c "import mne; print(mne.datasets.misc.data_path(verbose=True, force_update=True))"
Downloading file 'mne-misc-data-0.23.tar.gz' from 'https://codeload.github.com/mne-tools/mne-misc-data/tar.gz/0.23' to '/Users/larsoner/mne_data'.
0.00B [00:00, ?B/s]     
Untarring contents of '/Users/larsoner/mne_data/mne-misc-data-0.23.tar.gz' to '/Users/larsoner/mne_data'
/Users/larsoner/mne_data/MNE-misc-data

If so, there is not much we should do I think -- sometimes GitHub will give you the file size, sometimes it won't (same might be true for osf.io, not sure?). We could make multiple requests but I think just having some updating display here is good enough -- when it was running, I could see the file size actually updating, so it was clear something was happening. And each person doesn't have to call these super often, so I'd say it's a low priority bug at least.

After deleting the dataset and using sample.data_path(verbose=False), the path is returned

This I cannot replicate on macOS, assuming you did something like this:

$ rm -Rf /Users/larsoner/mne_data/MNE-misc-data
(base) bunker:mne-python larsoner$ python -c "import mne; print(mne.datasets.misc.data_path(verbose=True))"
Dataset misc version 0.0 out of date, latest version is 0.23
Downloading file 'mne-misc-data-0.23.tar.gz' from 'https://codeload.github.com/mne-tools/mne-misc-data/tar.gz/0.23' to '/Users/larsoner/mne_data'.
24.9MB [00:08, 5.93MB/s

All we do is look for the existence of the MNE-sample-data directory to decide if the dataset is there or not, so make sure you delete that directory (in addition to its contents).

@larsoner
Copy link
Member

I spoke too soon -- even on macOS there is a problem with COR.fif:

$ python -c "import mne; mne.datasets.sample.data_path(force_update=True, verbose=True)"
Downloading file 'MNE-sample-data-processed.tar.gz' from 'https://osf.io/86qa2/download?version=5' to '/Users/larsoner/mne_data'.
100%|█████████████████████████████████████| 1.65G/1.65G [00:00<00:00, 5.76TB/s]
Untarring contents of '/Users/larsoner/mne_data/MNE-sample-data-processed.tar.gz' to '/Users/larsoner/mne_data'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "<decorator-gen-529>", line 22, in data_path
  File "/Users/larsoner/python/mne-python/mne/datasets/sample/sample.py", line 15, in data_path
    return _download_mne_dataset(
...
  File "/Users/larsoner/opt/miniconda3/lib/python3.8/tarfile.py", line 2178, in makefile
    with bltn_open(targetpath, "wb") as target:
PermissionError: [Errno 13] Permission denied: '/Users/larsoner/mne_data/MNE-sample-data/subjects/sample/mri/brain-neuromag/sets/COR.fif'

But after a chmod a+w ~/mne_data/MNE-sample-data/subjects/sample/mri/brain-neuromag/sets/COR.fif, the subsequent call to data_path(force_update=True, verbose=True) works, so I think we should make this change at the MNE-sample-data end.

@mscheltienne
Copy link
Member Author

mscheltienne commented Jan 17, 2022

Do you mean like this as a "did not have progress bar" (second call below)?

No, I did twice: python -c "import mne; print(mne.datasets.sample.data_path(verbose=True, download=True, force_update=True))"; first call I had just the path, and second call I had a functioning progressbar.
In the first call, I did not get any incomplete verbose from the download; just nothing. Anyway, this is the least important bug here.


COR.fif should be fixed at MNE-sample-data end; but I can look for a more aggressive deletion policy just in case.

After deleting the dataset and using sample.data_path(verbose=False), the path is returned

This I cannot replicate on macOS, assuming you did something like this:

This issue will be fixed when COR.fif is fixed. I got it when I used the same process for a second call to .data_path() after a permission error is raised. I guess a worker/processor from pooch did not close and is causing the issue. (c.f. below, the second call did not download anything; even if the dataset is removed from another terminal).

image


Also, I am getting progress bars with verbose=False.

Capture

@larsoner
Copy link
Member

This issue will be fixed when COR.fif is fixed.

Pushed a commit to fix this.

Also, I am getting progress bars with verbose=False.

Argh, yes I can replicate this. I think I understand why it's happening -- we used to triage based on the current verbose level, but in the switch to pooch we lost this functionality. It seems like a bug to me that it no longer behaves nicely.

#10199 should be in soon, so feel free to wait for that or continue fixing bugs here and rebase/merge later @mscheltienne !

Co-authored-by: Eric Larson <larson.eric.d@gmail.com>
@mscheltienne mscheltienne changed the title mne.datasets should not require tqdm and should not require pooch if datasets are already present Fix verbose of mne.datasets fetching Jan 20, 2022
@mscheltienne
Copy link
Member Author

Let's see if it's green:

  • Fix MNE-sample-data/subjects/sample/mri/brain-neuromag/sets/COR.fif file.
  • Fix progressbar display based on verbosity level
  • Removes soft dependency/imports for pooch

Proposal:

  • Keyword only argument * for dataset.data_path() functions.
    -1 from @cbrnr for now.
    At least; I will put verbose as key-word only.

@larsoner
Copy link
Member

At least; I will put verbose as key-word only.

+1 for this, simple enough (and more backward compatible)

@mscheltienne
Copy link
Member Author

@larsoner Clearly I broke something by removing the _soft_import(); and I don't exactly understand what or what this test is checking for:

=================================== FAILURES ===================================
_____________________________ test_module_nesting ______________________________
mne/tests/test_import_nesting.py:44: in test_module_nesting
    assert code == 0, stdout + stderr
E   AssertionError: 
E     Found un-nested import(s) for ['pooch', 'pooch._version', 'pooch.core', 'pooch.downloaders', 'pooch.hashes', 'pooch.processors', 'pooch.utils', 'tqdm', 'tqdm._dist_ver', 'tqdm._monitor', 'tqdm._tqdm_pandas', 'tqdm.cli', 'tqdm.gui', 'tqdm.std', 'tqdm.utils', 'tqdm.version']
E   assert 16 == 0

@larsoner
Copy link
Member

@larsoner Clearly I broke something by removing the _soft_import(); and I don't exactly understand what or what this test is checking for:

It's testing to make sure module imports are nested. We try not to increase the number of modules we import at import mne time because it makes it take longer and longer. So don't use _soft_import but do nest the import pooch and import tqdm calls.

@mscheltienne
Copy link
Member Author

@larsoner Oh OK, that makes sense. Until now I believed it wasn't a good habit to nest the import in a function, but that's a very good point! I'll fix it.

@mscheltienne mscheltienne marked this pull request as ready for review January 20, 2022 15:54
@mscheltienne mscheltienne changed the title Fix verbose of mne.datasets fetching MRG: Fix verbose of mne.datasets fetching Jan 20, 2022
@mscheltienne mscheltienne requested a review from larsoner January 20, 2022 17:19
Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just needs latest.inc entry in the BUG section saying that verbose will be respected again when downloading data. Otherwise LGTM +1 for merge

@larsoner larsoner merged commit 2b2b0cf into mne-tools:main Jan 20, 2022
@larsoner
Copy link
Member

Thanks @mscheltienne !

@mscheltienne mscheltienne deleted the tqdm_and_pooch_req branch January 20, 2022 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

mne.datasets requires tqdm regardless of verbose

3 participants