Skip to content

Conversation

@jona-sassenhagen
Copy link
Contributor

My attempt at #1916

Please check if the general approach seems okay.
Basically, there is a function in mne.utils, _get_mkl_fft, that takes as arguments a string and n_jobs. If n_jobs isn't CUDA, it tries to import the function denoted by the string from mklfft.fftpack. It will also try to import mkl from mkl-service and set the number of threads to n_jobs.
Otherwise, it returns the corresponding function from scipy.fftpack.

mne/filter.py Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use a get function. Just allow to do

from mne.utils import fft

and make sure it corresponds to mkl if available.

ok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in mne.utils, I'd have something like

def fft(*args, **kwargs):
    try:
        from mklfft.fftpack import fft
    except ImportError:
        from scipy.fftpack import fft
    return fft(*args, **kwargs)

... or if not, can you point me to some code that does what you mean?

I tried modeling this after how ica and xdawn handle fast_dot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just add to the utils.py file

try:
    from mklfft.fftpack import fft
except ImportError:
    from scipy.fftpack import fft

it should do the trick

@jona-sassenhagen
Copy link
Contributor Author

Like this? (error is pep)

@agramfort
Copy link
Member

so far so good.

let us when you have benchmarks.

@jona-sassenhagen
Copy link
Contributor Author

On my i5 iMac, it's actually slower to use mklfft with 4 or 8 threads. I can't currently test it on our actually powerful computers because somebody is using them to run MATLAB ...

@jona-sassenhagen
Copy link
Contributor Author

I can't get the TF decomposition functions to use more than 1 thread (maybe because of it being wrapped inside parallel? But see below - ).

For filtering, with 48 threads, representative runs for MKL:
real 0m22.330s
user 9m39.524s
sys 0m18.133s

With MKL (12 threads):
real 0m21.795s
user 3m36.406s
sys 0m9.849s

Without MKL (n_jobs=1):
real 0m22.783s
user 0m19.933s
sys 0m2.680s

Without MKL (n_jobs=12):
real 0m16.730s
user 0m36.374s
sys 0m10.653s

Without MKL (n_jobs=48):
real 0m17.527s
user 0m35.702s
sys 0m20.013s

With MKL and parallel (n_jobs=4-8, MKL threads=4-12):
real 0m17.593s
user 6m39.953s
sys 0m19.461s

So nothing much beyond regular parallel with joblib. I guess it would save memory to run 4 jobs with 10 MKL threads each rather than 40 regular jobs?

@jasmainak
Copy link
Member

Thanks for taking a stab at this @jona-sassenhagen . Something looks weird with your commit history. There are merge commits. You can fix it by rebase

@jona-sassenhagen
Copy link
Contributor Author

Thanks @jasmainak ... not sure it's worth it though, seeing as I don't see improvements. Is anyone else getting speed improvements? I can imagine it might make a difference for systems with many cores, but little RAM, but I don't have such a machine available.

mne/utils.py Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this necessary? I think we can live with a small exception here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better just to use # noqa at the end of the offending lines

@jasmainak
Copy link
Member

Last I remember when I tried it, I wasn't getting any improvements ...

@jona-sassenhagen
Copy link
Contributor Author

In that thread, Alex said it was due to multithreading being required. But I somehow don't see improvements even when really hammering all of our cores with mklfft either. I don't know.

@jasmainak
Copy link
Member

@jona-sassenhagen : can you clarify how I should read your benchmarks? What is real, user and sys?

@jona-sassenhagen
Copy link
Contributor Author

That's just with standard Unix time - so "real" is the actual time it takes to run the script, and I guess "user" is a measure of CPU time or something.

@jasmainak
Copy link
Member

Could it be that there aren't many fft operations to make a real difference? Maybe you need to benchmark on something where there are many fft operations ...

@jona-sassenhagen
Copy link
Contributor Author

I'd think filtering raws with method=fft is one of the most fft heavy task we have, or is there a better test?

@jasmainak
Copy link
Member

did you try TF decomposition on epochs? that was real slow. It might be worthwhile to profile and figure out if at least the fft operations are faster.

@jona-sassenhagen
Copy link
Contributor Author

Which method (multitaper etc) ... would benefit the most you think?

@jasmainak
Copy link
Member

No idea ... I remember having tried tfr_morlet with use_fft=True and that was pretty slow. Also, for profiling: https://github.com/rkern/line_profiler in case you don't know it already

@jasmainak
Copy link
Member

By pretty slow, I mean it could take 10 minutes to run on my computer ...

@jona-sassenhagen
Copy link
Contributor Author

@jasmainak that's sadly not disproportionally slow for TF analyses I fear. I've had TF decompositions take days to run for a full experiment and low frequencies (= long windows) on EEGLAB.
If you really want speed, CUDA is the way to go.

@jasmainak
Copy link
Member

@agramfort thoughts?

@agramfort
Copy link
Member

can you share a bench script?

@jasmainak
Copy link
Member

@jona-sassenhagen would you have time for this in the coming days? If not, I can take a look once the EEGLAB .set reader is merged.

@jona-sassenhagen
Copy link
Contributor Author

I'm a bit demotivated due to the lack of results ... feel free to take over.

@jona-sassenhagen
Copy link
Contributor Author

Closing this for now as I just opened two other PRs, @Eric89GXL opened his issue on the topic, @jasmainak indicated he might want to take over, and I'm not making progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants