Skip to content

Conversation

@mainakjas
Copy link
Contributor

No description provided.

@larsoner
Copy link
Member

@mainakjas this made things worse somehow, right? Should we close, or are there additional things to try?

@mainakjas
Copy link
Contributor Author

sorry @Eric89GXL for not getting back to this yet. I think I had some things to try on this but it's not on the top of my priority list at the moment

@agramfort
Copy link
Member

I just did a quick bench here with this script:

import numpy as np
from mklfft.fftpack import fft as fft_mkl  # , ifftn
from scipy.fftpack import fft as fft_scipy

x = np.random.randn(2 ** 13)
print(x.shape)

xf_scipy = fft_scipy(x)
xf_mkl = fft_mkl(x)

np.testing.assert_array_almost_equal(xf_scipy, xf_mkl)

%timeit fft_scipy(x)
%timeit fft_mkl(x)

with : mkl.set_num_threads(1) I get :

10000 loops, best of 3: 103 µs per loop
10000 loops, best of 3: 116 µs per loop

with : mkl.set_num_threads(2) I get :

10000 loops, best of 3: 103 µs per loop
10000 loops, best of 3: 81.1 µs per loop

with : mkl.set_num_threads(4) I get :

10000 loops, best of 3: 103 µs per loop
10000 loops, best of 3: 71.4 µs per loop

so take home message mkl fft worse than scipy fft if num_threads = 1 but as soon as we allow more than 2 threads mkl wins with a speedup up to a power of 2 in this case.

now if you don't take a power of 2 for the length of x ( x = np.random.randn(2 ** 13 - 1) ) then I see with mkl.set_num_threads(4)

10 loops, best of 3: 62.3 ms per loop
1000 loops, best of 3: 703 µs per loop

so MKL is about 88x faster in this case...

we need to bring the PR back to life...

@jona-sassenhagen
Copy link
Contributor

Wouldn't it make sense to approach this like _get_fast_dot? E.g., have a function that tries to return mkl's fft, and if it can't, it returns the default spicy fft?

@dengemann
Copy link
Member

Seems like Jona has some vision here and is volunteering :)

On Mon, Nov 16, 2015 at 11:16 AM, jona-sassenhagen <notifications@github.com

wrote:

Wouldn't it make sense to approach this like _get_fast_dot? E.g., have a
function that tries to return mkl's fft, and if it can't, it returns the
default spicy fft?


Reply to this email directly or view it on GitHub
#1916 (comment)
.

@agramfort
Copy link
Member

agramfort commented Nov 16, 2015 via email

@jona-sassenhagen
Copy link
Contributor

Hm ... I'll see if I can get around to it. I was just trying to get our boss to buy a GPU/CUDA machine :)

As this is 1. restricted to TFR, 2. applies to 3 or so functions, should the get_X functions go in mne.utils or in tfr?

@agramfort
Copy link
Member

agramfort commented Nov 16, 2015 via email

@jona-sassenhagen
Copy link
Contributor

Argh ... it's always more complicated than you think ... how to set the # of threads? I guess we don't want to use the max # of threads by default. We can set to max number of threads minus one by default (and if there's only 2, we return the scipy version). Or we could read from n_jobs. Or set a global, user-accessible parameter. Or ...

@agramfort
Copy link
Member

agramfort commented Nov 16, 2015 via email

@jona-sassenhagen
Copy link
Contributor

Ok I'll see if I can get around to it. Probably once I start doing more time-frequency stuff myself again.

@jona-sassenhagen
Copy link
Contributor

If I understand it correctly, for MKL fft in welch-reliant stuff (e.g. PSDs), we'd have to reimplement them?

@agramfort
Copy link
Member

agramfort commented Nov 17, 2015 via email

@agramfort
Copy link
Member

closing in favor of #2623

@agramfort agramfort closed this Nov 23, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants