-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
WIP: initialize mklfft #2623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: initialize mklfft #2623
Conversation
mne/filter.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't use a get function. Just allow to do
from mne.utils import fft
and make sure it corresponds to mkl if available.
ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in mne.utils, I'd have something like
def fft(*args, **kwargs):
try:
from mklfft.fftpack import fft
except ImportError:
from scipy.fftpack import fft
return fft(*args, **kwargs)
... or if not, can you point me to some code that does what you mean?
I tried modeling this after how ica and xdawn handle fast_dot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just add to the utils.py file
try:
from mklfft.fftpack import fft
except ImportError:
from scipy.fftpack import fft
it should do the trick
… into mklfft * 'mklfft' of https://github.com/jona-sassenhagen/mne-python: Update filter.py
… into mklfft * 'mklfft' of https://github.com/jona-sassenhagen/mne-python: Stupid error
|
Like this? (error is pep) |
|
so far so good. let us when you have benchmarks. |
|
On my i5 iMac, it's actually slower to use mklfft |
|
I can't get the TF decomposition functions to use more than 1 thread (maybe because of it being wrapped inside For filtering, with 48 threads, representative runs for MKL: With MKL (12 threads): Without MKL (n_jobs=1): Without MKL (n_jobs=12): Without MKL (n_jobs=48): With MKL and parallel (n_jobs=4-8, MKL threads=4-12): So nothing much beyond regular parallel with joblib. I guess it would save memory to run 4 jobs with 10 MKL threads each rather than 40 regular jobs? |
|
Thanks for taking a stab at this @jona-sassenhagen . Something looks weird with your commit history. There are merge commits. You can fix it by rebase |
|
Thanks @jasmainak ... not sure it's worth it though, seeing as I don't see improvements. Is anyone else getting speed improvements? I can imagine it might make a difference for systems with many cores, but little RAM, but I don't have such a machine available. |
mne/utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this necessary? I think we can live with a small exception here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better just to use # noqa at the end of the offending lines
|
Last I remember when I tried it, I wasn't getting any improvements ... |
|
In that thread, Alex said it was due to multithreading being required. But I somehow don't see improvements even when really hammering all of our cores with |
|
@jona-sassenhagen : can you clarify how I should read your benchmarks? What is real, user and sys? |
|
That's just with standard Unix |
|
Could it be that there aren't many fft operations to make a real difference? Maybe you need to benchmark on something where there are many fft operations ... |
|
I'd think filtering raws with |
|
did you try TF decomposition on epochs? that was real slow. It might be worthwhile to profile and figure out if at least the |
|
Which method (multitaper etc) ... would benefit the most you think? |
|
No idea ... I remember having tried |
|
By pretty slow, I mean it could take 10 minutes to run on my computer ... |
|
@jasmainak that's sadly not disproportionally slow for TF analyses I fear. I've had TF decompositions take days to run for a full experiment and low frequencies (= long windows) on EEGLAB. |
|
@agramfort thoughts? |
|
can you share a bench script? |
|
@jona-sassenhagen would you have time for this in the coming days? If not, I can take a look once the EEGLAB .set reader is merged. |
|
I'm a bit demotivated due to the lack of results ... feel free to take over. |
|
Closing this for now as I just opened two other PRs, @Eric89GXL opened his issue on the topic, @jasmainak indicated he might want to take over, and I'm not making progress. |
My attempt at #1916
Please check if the general approach seems okay.
Basically, there is a function in mne.utils,
_get_mkl_fft, that takes as arguments a string andn_jobs. Ifn_jobsisn't CUDA, it tries to import the function denoted by the string frommklfft.fftpack. It will also try to import mkl from mkl-service and set the number of threads to n_jobs.Otherwise, it returns the corresponding function from
scipy.fftpack.