[DOC, MRG] fix/improve documentation of n_jobs for apply_function #11325

mscheltienne · 2022-11-14T14:21:20Z

Minor documentation fix for the apply_function methods, e.g. https://mne.tools/dev/generated/mne.io.Raw.html#mne.io.Raw.apply_function

If n_jobs > 1, more memory is required as len(picks) * n_times additional time points need to be temporarily stored in memory.

-> n_jobs * n_times and not len(picks).

…ip actions]

mne/epochs.py

mne/evoked.py

mne/io/base.py

mscheltienne · 2022-11-14T15:14:26Z

Could not build wheels for pydata-sphinx-theme

Can someone with the permissions restart the CircleCI workflow? It looks like it does not restart on close/re-open as Azure.

agramfort · 2022-11-14T15:37:45Z

just push an empty commit @mscheltienne to restart CI

mscheltienne · 2022-11-14T15:45:36Z

I'm using GitHub Desktop most of the time, which does not yet support empty commits 😞
I was sure I did not have git CLI configured with my identifiers on this computer.. but apparently, I'm wrong!

mscheltienne · 2022-11-14T16:06:03Z

Render (raw): https://output.circle-artifacts.com/output/job/3f0e4696-03b8-439d-b59b-e249e635e331/artifacts/0/dev/generated/mne.io.Raw.html#mne.io.Raw.apply_function

drammock · 2022-11-15T15:51:07Z

mne/utils/docs.py


 .. note:: If ``n_jobs`` > 1, more memory is required as
-          ``len(picks) * n_times`` additional time points need to
+          ``n_jobs * n_times`` additional time points need to


are you certain that this change is correct? The original text implies that full copies of the array are made for each job (which would be unfortunate, but is plausible). I just want to confirm that you checked this before proposing the change.

Very good point, and with all the efforts to save memory left and right, I did not even think about it. Duplicating the entire data array 'n_jobs' times would be very unfortunate.

But it looks like it was well design:

mne-python/mne/io/base.py

Lines 950 to 965 in aeeb111

if channel_wise:

parallel, p_fun, n_jobs = parallel_func(_check_fun, n_jobs)

if n_jobs == 1:

# modify data inplace to save memory

for idx in picks:

self._data[idx, :] = _check_fun(fun, data_in[idx, :],

**kwargs)

else:

# use parallel function

data_picks_new = parallel(

p_fun(fun, data_in[p], **kwargs) for p in picks)

for pp, p in enumerate(picks):

self._data[p, :] = data_picks_new[pp]

else:

self._data[picks, :] = _check_fun(

fun, data_in[picks, :], **kwargs)

If I read this correctly, each job receives data_in[p] which contains a single channel.

But I guess in the end data_picks_new is a duplicate of the entire array.. so the docstring was in fact correct. My bad!

Suggested change

``n_jobs * n_times`` additional time points need to

``len(picks) * n_times`` additional time points need to

wait... if I'm reading the code you quoted correctly, data_picks_new is one array that is the same size as data_in, but (crucially) it's outside a for-loop over picks. So that suggests that n_jobs * n_times (your original change) was in fact correct? But I'm not intimately familiar with the parallel code, maybe @larsoner can clarify for us.

I don't think so. data_picks_new is indeed the same shape as data_in (except it's a list of len(picks) arrays of shape (n_times,) instead of a 2D array).

So after the execution of parallel() we end up with 2 arrays of shape (len(picks), n_times): self._data (also called data_in but it's the same array) and data_picks_new, until garbage collection at the end of the function where the latter is deleted.

Temporarily, we end up with a second array of shape (len(picks), n_times) even if each job receives only one channel (which initially led me to believe the docstring was false).

ah, you are right. I was briefly thinking it was n_jobs * len(picks) * n_times (so much worse!) without realizing that I was assuming that. I'll merge then. Thanks @mscheltienne !

* upstream/main: fix epochs.plot_image for EMG data (mne-tools#11322) fix fontawesome icon display (mne-tools#11328) [DOC, MRG] fix/improve documentation of n_jobs for apply_function (mne-tools#11325) [MAINT, MRG] fix is_mesa (mne-tools#11313) BUG: Fix bug with parallel progress bars (mne-tools#11311) BUG: Fix bug with report replacement (mne-tools#11318) MAINT: Fix docs (mne-tools#11317) Fix typo in changelog (mne-tools#11315)

mscheltienne added 2 commits November 14, 2022 15:18

fix/improve documentation of n_jobs for apply_function [skip azp] [sk…

7f9b65f

…ip actions]

fix typo [skip azp] [skip actions]

0639ba4

mscheltienne commented Nov 14, 2022

View reviewed changes

mne/epochs.py Outdated Show resolved Hide resolved

mne/evoked.py Outdated Show resolved Hide resolved

mne/io/base.py Outdated Show resolved Hide resolved

fix my english [skip azp] [skip actions]

6d1aeab

mscheltienne closed this Nov 14, 2022

mscheltienne reopened this Nov 14, 2022

restart ci [skip azp] [skip actions]

2bb0ca2

mscheltienne changed the title ~~[Doc] fix/improve documentation of n_jobs for apply_function~~ [DOC, MRG] fix/improve documentation of n_jobs for apply_function Nov 14, 2022

drammock approved these changes Nov 15, 2022

View reviewed changes

revert incorrect change [skip azp] [skip actions]

b7f13c2

drammock merged commit be6bee7 into mne-tools:main Nov 15, 2022

mscheltienne deleted the fix_doc branch November 15, 2022 21:45

	if channel_wise:
	parallel, p_fun, n_jobs = parallel_func(_check_fun, n_jobs)
	if n_jobs == 1:
	# modify data inplace to save memory
	for idx in picks:
	self._data[idx, :] = _check_fun(fun, data_in[idx, :],
	**kwargs)
	else:
	# use parallel function
	data_picks_new = parallel(
	p_fun(fun, data_in[p], **kwargs) for p in picks)
	for pp, p in enumerate(picks):
	self._data[p, :] = data_picks_new[pp]
	else:
	self._data[picks, :] = _check_fun(
	fun, data_in[picks, :], **kwargs)

	``n_jobs * n_times`` additional time points need to
	``len(picks) * n_times`` additional time points need to

Uh oh!

[DOC, MRG] fix/improve documentation of n_jobs for apply_function #11325

[DOC, MRG] fix/improve documentation of n_jobs for apply_function #11325

Uh oh!

Conversation

mscheltienne commented Nov 14, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mscheltienne commented Nov 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

agramfort commented Nov 14, 2022

Uh oh!

mscheltienne commented Nov 14, 2022

Uh oh!

mscheltienne commented Nov 14, 2022

Uh oh!

drammock Nov 15, 2022

Choose a reason for hiding this comment

Uh oh!

mscheltienne Nov 15, 2022

Choose a reason for hiding this comment

Uh oh!

mscheltienne Nov 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mscheltienne Nov 15, 2022

Choose a reason for hiding this comment

Uh oh!

drammock Nov 15, 2022

Choose a reason for hiding this comment

Uh oh!

mscheltienne Nov 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drammock Nov 15, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mscheltienne commented Nov 14, 2022 •

edited

Loading

mscheltienne Nov 15, 2022 •

edited

Loading

mscheltienne Nov 15, 2022 •

edited

Loading