ENH: Speed up set_bipolar_reference #9270

marsipu · 2021-04-09T01:03:07Z

Reference issue

What does this implement/fix?

This improves the performance of adding a larger number (>25) of bipolar channels as suggested by @jasmainak.
EDIT: A gist to measure the speed can be found here

Additional information

I encountered some difficulties with how to best organize the creation of the new channels with the new method and handling Info while staying conservative. I think a large part of the performance improvement comes from moving add_channels outside the loop as done with the last commit 29f609d2. Before, even with the new matrix-multiplication-method the performance was much smaller with n>25.

At the moment, the new bipolar-channels are always appended (like when drop_refs=False) and don't take the place of the anode as in the previous implementation. If that is important I could add a reorder_channels afterwards with bipolar channels taking place of the anodes again.

As the bipolar-channels are not taking the same place as the anodes, I had to change the test a bit for now to pass.
There was also an additional attribute of the UpdateChannelsMixin called _projector besides picks which seemed to be outdated too with changed channels in Info.

Was it on purpose, that the bipolar reference of epochs was created with the mean of the anode-epochs?

ref_data = data[..., ref_from, :].mean(-2, keepdims=True)

In the new implementation it is created with the anode-signal of each single-epoch but could also be changed to the epochs mean.

ref_inst._data = np.asarray([multiplier.dot(ep) for ep in inst._data])

I changed the location of the bipolar-channel from zero to the location of the cathode as suggested by @jasmainak . I wondered, why every other 'chs'-attribute should still be taken from anode. Couldn't it be the anode-location instead or all the other info-attributes taken from the cathode?

What do you think about these changes?

jasmainak · 2021-04-09T02:08:47Z

mne/io/reference.py

+    # channel in anode multiple times)
+    ref_instances = list()
+    for ch_idx, (an, ca, name, info) in enumerate(zip(anode, cathode,
+                                                      ch_name, ch_info)):


couldn't you simply do:

ref_inst = inst.copy().pick_channels(cathode) ref_inst.rename_channels(...)

it will read cleaner and maybe even faster

I did this at first, but this fails when one channel occurs multiple times in anode or cathode, as pick_channels converts to a set. It would be cleaner and probably faster, but at the cost, that you can only use each channel once on each orientation

There is a test for this in test_reference which fails then

ughgh ... I see. I would try to avoid using add_channels because that will probably make unnecessary copies of the data (for each new channel?). Can you use RawArray or EvokedArray or some such thing to construct the final instance in one go? I suspect it will be faster

I would try to avoid using add_channels because that will probably make unnecessary copies of the data (for each new channel?). Can you use RawArray or EvokedArray or some such thing to construct the final instance in one go? I suspect it will be faster

Indeed iterating will be slower than a vectorized solution. I would figure out what dot product needs to be done, then do a single raw.drop_channels call, then do a single raw.add_channels call with the new data. The overhead from this should be much smaller

jasmainak · 2021-04-09T02:13:51Z

can you share some performance benchmarks? Say if I have 180 channels (so 179 anode-cathode pairs), how long does it take to compute bipolar with old and new implementation for a) raw b) epochs c) evoked

jasmainak · 2021-04-09T02:14:55Z

mne/io/reference.py

+                                             force_update_info=True)
+
+    if isinstance(inst, BaseEpochs):
+        ref_inst._data = np.asarray([multiplier.dot(ep) for ep in inst._data])


I would leave this for the end but there might be some performance benefits of using einsum for the epochs. I always get confused when using it though

@ has much saner broadcasting behavior than np.dot (@ operates along the last two dims of both inputs, and np.dot... doesn't), this is probably the same as multiplier @ inst._data

Thank you, @ works great. It doesn't seem to greatly improve performance though.

It appears to me, that the rereferencing itself doesn't take as much of the computation time as the overhead from creating and checking info in add_channels. You can try it out, if you replace l. 509-511 with:

if isinstance(ref_instances, list): ref_instances = add_inst else: ref_instances.add_channels([add_inst], force_update_info=True) ref_inst = ref_instances

marsipu · 2021-04-09T18:37:37Z

can you share some performance benchmarks? Say if I have 180 channels (so 179 anode-cathode pairs), how long does it take to compute bipolar with old and new implementation for a) raw b) epochs c) evoked

I uploaded a test-file for speed.
There is an error for epochs with this test-file using the old implementation which is now fixed.
Raw and Evokeds alone should demonstrate the drastic improve in performance too:

Before fix(-epochs):
Creation of 26 bipolar channels for Raw&Evoked took: 38.687 s

After fix (- epochs, - @-operator):
Creation of 26 bipolar channels for Raw&Evoked took: 1.801 s

After fix (+ epochs, - @-operator):
Creation of 26 bipolar channels for Raw&Evoked took: 3.241 s

After fix (+ epochs, + @-operator):
Creation of 26 bipolar channels for Raw&Evoked took: 3.218 s

I set the number to 26 because (at least on my laptop) the computation time seems to increase exponentially above ~20 (EDIT: and won't finish). I don't really understand why, I assume it is because of some kind of overhead introduced by the add_channels-code when called multiple times in a row.
Can you reproduce this behaviour?

jasmainak · 2021-04-09T20:02:27Z

Excellent, thanks for the benchmarks! You can edit your original description and add it there so it is not lost amongst the comments. And we can update it when you make more optimizations. 180 was the number of channels in my dataset and how I discovered the problem. You can see the dramatic improvement even with 26 channels :)

I think at this point, you can also try to use snakeviz or line profiler to see which lines are the slowest. Like you, I suspect add_channels though profiling will make it concrete

I think we should aim for sub-second speed, referencing should be instantaneous!

ps: it might be easier if you share a gist that works with sample data (than a custom file) so we can copy-paste instantly and try things if you need help

marsipu · 2021-04-20T14:40:46Z

A quick question:
I don't really understand, why electrode-information is taken from the anode, but location should be taken from the cathode.
Is this a convention?

agramfort · 2021-04-20T14:49:51Z

it should not be. the chs['loc'] should ideally contain the 2 locations with the 2nd location being the reference used.

…

marsipu · 2021-04-20T17:17:41Z

it should not be. the chs['loc'] should ideally contain the 2 locations with the 2nd location being the reference used.
…

Ok thank you, now all channel-info including location is copied from anode

marsipu · 2021-04-20T17:52:44Z

In this gist I uploaded the protocols of profiling before and after the most recent improvements. The major part of the time was indeed taken by the multiple call of add_channels and the creation of an instance for each new bipolar-channel.
Now, as suggested, only one instance for the new rereferenced channels is created which is then added once to the original instance. I rather kept the add_channels once, because I didn't want to miss anything related to merging data and info, even though the two instances should be very similar. Or do you think this should be improved too with a custom function for adding the referenced data?

Performance before recent improvements:

Creation of 26 bipolar channels took:
Raw: 1.642 s
Epochs: 2.073 s
Evoked: 0.803 s

Creation of 180 bipolar channels took:
Raw: 11.286 s
Epochs: 14.424 s
Evoked: 5.513 s

Performance after recent improvements:

Creation of 26 bipolar channels took:
Raw: 0.165 s
Epochs: 0.275 s
Evoked: 0.054 s

Creation of 180 bipolar channels took:
Raw: 0.358 s
Epochs: 0.633 s
Evoked: 0.055 s

drammock · 2021-04-20T18:47:21Z

@marsipu what is the reason for [ci skip] here? The code changes are substantial, they should be tested. Can you git commit --amend to remove the [ci skip] from your last commit, and then git push --force? Or alternatively, git commit --allow-empty -m "trigger CIs" and then git push.

marsipu · 2021-04-20T19:15:54Z

@marsipu what is the reason for [ci skip] here? The code changes are substantial, they should be tested. Can you git commit --amend to remove the [ci skip] from your last commit, and then git push --force? Or alternatively, git commit --allow-empty -m "trigger CIs" and then git push.

@drammock Oh I am sorry, I thought it was good practice to skip ci for minor changes to save resources for other PRs. But I see now, I probably should have had at least documentation ci again for cb2e3d0. There was ci for all major changes before. When is it allowed/advisable to skip any ci?
I will trigger ci again as you instructed.

drammock · 2021-04-20T19:34:28Z

When is it allowed/advisable to skip any ci?

It is generally speaking not avisable to skip CIs. The one exception that I'm willing to put in writing is this: if all you changed is documentation (tutorial, example, and/or docstring), then [skip azp][skip github] is acceptable.

The reason for this is that the CIs can catch problems unrelated to your changes, that cropped up due to new versions of our dependencies being released. The sooner we catch those, the sooner we can fix them before real users encounter them. Two such examples came up just yesterday: #9321 and pydata/pydata-sphinx-theme#395.

The other case where you might do [ci skip] is if your PR is a draft PR and you know that the commits you're pushing aren't yet enough to fix the bug / implement the new feature, and you don't want to waste energy by running the CIs on something you already know will fail.

marsipu · 2021-04-20T20:14:08Z

It is generally speaking not avisable to skip CIs. The one exception that I'm willing to put in writing is this: if all you changed is documentation (tutorial, example, and/or docstring), then [skip azp][skip github] is acceptable.

Thank you for clarification, I will respect that from now on.

larsoner

LGTM +1 for merge with or without my test suggestion. Thanks @marsipu !

mne/io/tests/test_reference.py

…so pass when bipolar-channels are appended

…he errors

…oner

…oner

jasmainak

Fantastic PR. Thanks @marsipu !

doc/changes/latest.inc

Co-authored-by: Mainak Jas <jasmainak@users.noreply.github.com>

marsipu · 2021-04-21T16:53:52Z

There seems to be an error for test_report.py::test_scraper, which doesn't seem to be related to this PR, right?
I could maybe fix this with exist_ok=True for makedirs in the line throwing the error, but I am not sure if this defies the purpose of the test.

drammock · 2021-04-21T22:28:35Z

I've opened #9332 to deal with the failing scraper tests.

larsoner · 2021-04-21T23:42:32Z

Close-reopen cycle to restart all CIs

larsoner · 2021-04-22T14:57:09Z

Thanks @marsipu !

marsipu changed the title ~~Bipolar faster~~ ENH: Faster set_bipolar_reference Apr 9, 2021

marsipu changed the title ~~ENH: Faster set_bipolar_reference~~ ENH: Speed up set_bipolar_reference Apr 9, 2021

jasmainak reviewed Apr 9, 2021

View reviewed changes

marsipu changed the title ~~ENH: Speed up set_bipolar_reference~~ WIP: Speed up set_bipolar_reference Apr 16, 2021

marsipu force-pushed the bipolar_faster branch from 164d5c2 to 60f1e7a Compare April 20, 2021 17:15

marsipu requested review from jasmainak and larsoner April 20, 2021 18:13

marsipu changed the title ~~WIP: Speed up set_bipolar_reference~~ ENH: Speed up set_bipolar_reference Apr 20, 2021

marsipu force-pushed the bipolar_faster branch from 0b700f7 to fc8ac07 Compare April 20, 2021 19:18

larsoner approved these changes Apr 21, 2021

View reviewed changes

mne/io/tests/test_reference.py Outdated Show resolved Hide resolved

larsoner added this to the 0.23 milestone Apr 21, 2021

marsipu added 7 commits April 21, 2021 16:58

Add cathode-location and fix doc

a8887f2

Implementing matrix-multiplication approach (by @jasmainak)

7eb5e24

Fix returning ref_to from _check_before_reference

b2733d7

Refine how Info is copied to new channels to avoid mixups and pass tests

868f19b

Adjust test comparing info of anode/cathode and bipolar-channel to al…

ba371f2

…so pass when bipolar-channels are appended

Concatenation of Reference-Instances outside the loop

9a42e26

Using @-operator for matrix-multiplication

daeade2

marsipu added 8 commits April 21, 2021 16:58

Update test_set_bipolar_reference to show info-keys responsible for t…

3cd153b

…he errors

Improve performance by creating reference-instance from scratch

000005c

channel-information including location is taken from anode

ed55f40

Update test for info just taken from anode

0c1f771

Fix import of create_info, improve docs[ci skip]

c555541

Addition to latest.inc

53c7392

Fix latest.inc

c4df65f

Refactor assert-statements in test_reference.py as suggested by @lars…

ada509e

…oner

marsipu force-pushed the bipolar_faster branch from c3000f7 to ada509e Compare April 21, 2021 14:59

jasmainak approved these changes Apr 21, 2021

View reviewed changes

jasmainak reviewed Apr 21, 2021

View reviewed changes

doc/changes/latest.inc Outdated Show resolved Hide resolved

Update latest.inc as suggested by @jasmainak

70b6766

Co-authored-by: Mainak Jas <jasmainak@users.noreply.github.com>

agramfort approved these changes Apr 21, 2021

View reviewed changes

larsoner closed this Apr 21, 2021

larsoner reopened this Apr 21, 2021

larsoner merged commit 350f76b into mne-tools:main Apr 22, 2021

marsipu deleted the bipolar_faster branch April 22, 2021 15:03

Uh oh!

ENH: Speed up set_bipolar_reference #9270

ENH: Speed up set_bipolar_reference #9270

Uh oh!

Conversation

marsipu commented Apr 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference issue

What does this implement/fix?

Additional information

Uh oh!

jasmainak Apr 9, 2021

Choose a reason for hiding this comment

Uh oh!

marsipu Apr 9, 2021

Choose a reason for hiding this comment

Uh oh!

marsipu Apr 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jasmainak Apr 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

larsoner Apr 12, 2021

Choose a reason for hiding this comment

Uh oh!

jasmainak commented Apr 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jasmainak Apr 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

larsoner Apr 9, 2021

Choose a reason for hiding this comment

Uh oh!

marsipu Apr 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marsipu commented Apr 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jasmainak commented Apr 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marsipu commented Apr 20, 2021

Uh oh!

agramfort commented Apr 20, 2021 via email

Uh oh!

marsipu commented Apr 20, 2021

Uh oh!

marsipu commented Apr 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drammock commented Apr 20, 2021

Uh oh!

marsipu commented Apr 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drammock commented Apr 20, 2021

Uh oh!

marsipu commented Apr 20, 2021

Uh oh!

larsoner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jasmainak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

marsipu commented Apr 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drammock commented Apr 21, 2021

Uh oh!

larsoner commented Apr 21, 2021

marsipu commented Apr 9, 2021 •

edited

Loading

marsipu Apr 9, 2021 •

edited

Loading

jasmainak Apr 9, 2021 •

edited

Loading

jasmainak commented Apr 9, 2021 •

edited

Loading

jasmainak Apr 9, 2021 •

edited

Loading

marsipu Apr 9, 2021 •

edited

Loading

marsipu commented Apr 9, 2021 •

edited

Loading

jasmainak commented Apr 9, 2021 •

edited

Loading

marsipu commented Apr 20, 2021 •

edited

Loading

marsipu commented Apr 20, 2021 •

edited

Loading

marsipu commented Apr 21, 2021 •

edited

Loading