-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
fix sfreq estimation for snirf files #13184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for the fix, I attached the file to the issue, here's the link again: https://github.com/mne-tools/mne-python/files/14665643/WR11.Baseline.SNIRF.zip |
|
Thanks @kdarti (and sorry I missed the fact that you'd already provided it!) In fact the changes I've made so far will not address your problem, as the times in your file don't even yield a single estimate of period length when rounding to 4 decimals (so this file triggers the import matplotlib.pyplot as plt
import numpy as np
from h5py import File
from tabulate import tabulate
with File("WR11 Baseline SNIRF.snirf", "r") as fid:
times = np.array(fid["nirs/data1/time"])
periods = np.diff(times)
sfreqs = 1 / periods
uniq_sfreq, counts = np.unique(sfreqs, return_counts=True)
print(tabulate(list(zip(uniq_sfreq, counts)), headers=("sfreq", "count"), floatfmt=(".6f",)))
# sfreq count
# --------- -------
# 49.951218 24433
# 49.989323 6016
# 49.998856 2232
# 49.999451 8
# 49.999748 8
# 49.999897 8
# 49.999973 9
# 49.999992 2
# 50.000000 4
# 50.000011 6
# 50.000050 347
# 50.001240 192
# 50.003624 192
# 50.008392 1984
# 50.027481 9984
# 50.103977 8192
# 50.257668 460
sfreqs.mean().item() # mean
# 50.00006866455078
np.median(sfreqs).item() # median
# 49.989322662353516
uniq_sfreq[np.argmax(counts)].item() # mode
# 49.95121765136719
x = np.arange(len(sfreqs))
fig, ax = plt.subplots(layout="constrained")
ax.hlines(50, 0, len(sfreqs), colors='r', linestyles="dotted")
ax.scatter(x, sfreqs, linewidths=0, s=4)Put simply: any sane way I can think of to estimate the sfreq based on this vector of times is going to yield something that is not exactly 50 Hz. |
|
Should we add a |
How would that work exactly? Would we take their passed-in value and just fake the time vector as I guess I also don't see what the problem is with our estimate of the sfreq being off by a tiny fraction of a Hz from the hardware's nominal sampling frequency. To be clear: I think the change in this PR is a good one, but it turns out to be unrelated to the problem reported in #12508, and it's not clear to me what (if anything) should be done about #12508. |
Yep. And we could still do the calculations and if it's off by
I think that plotting and time picking etc. all get a bit ugly/worse for the end user. Instead of having nicely formatted values you'll have ugly/long ones like we have for If the data values are were unevenly sampled, we could consider allowing for a linear interpolation back to a grid. It would probably be good enough assuming the data are sufficiently lowpassed. But given how small the deviations are above, I doubt it would change anything. |
|
ok that makes sense. Will adapt this PR accordingly |
Again, I'd like to note that there is nothing weird with the file, all values in /nirs/data1/time have 2 decimals and values increment by 0.02. Just look at the weird values in periods and you can see that there are no weird values in time at those points. As mentioned in the issue this is about float errors. Unless people specify the time vector using only start and end time (which is the other allowed option), this file is nothing out of the ordinatry. Note that this isn't super important to fix as the deviation from the true sample rate as specified by the /nirs/data1/time in the snirf file is small, but it is still wrong. With that said, it could be solved with fractions. |
@kdarti can you show me how you came to this conclusion? As far as I can tell, the data in the file are from h5py import File
with File("WR11 Baseline SNIRF.snirf", "r") as fid:
print(fid["nirs/data1/time"].dtype)
times = fid["nirs/data1/time"][:]
print("\nraw times:")
print(times)
print("\nunique period lengths:")
with np.printoptions(precision=12):
print(np.unique(times[1:] - times[:-1]))
# float32
#
# raw times:
# [0.00000e+00 2.00000e-02 4.00000e-02 ... 1.08150e+03 1.08152e+03
# 1.08154e+03]
#
# unique period lengths:
# [0.019897461 0.019958496 0.019989014 0.019996643 0.01999855 0.019999504
# 0.01999998 0.019999996 0.02 0.020000003 0.02000001 0.02000004
# 0.0200001 0.02000022 0.020000458 0.020004272 0.020019531]edited code sample to make it clearer |
|
Put another way: there's little we can do to work around how the values are stored in the HDF5 file. They're stored as float32 values, there's some inherent imprecision: np.float32(0.02).as_integer_ratio()
# (5368709, 268435456) # <---- notice it's not (1, 50)
np.float32(0.02).astype(np.float64)
# np.float64(0.019999999552965164)
np.float32(0.02).data.tolist() # this upcasts to native python `float` (64-bit)
# 0.019999999552965164 |
My bad, was in a bit too much of a rush. So with 32 bit float this will always be the case then, since it's allowed in snirf spec but it'll still pass the jitter/variable sample rate check it's not really a problem then. |
| # specified as onset, samplerate | ||
| sampling_rate = 1.0 / (time_data[1] - time_data[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was a bug. length-2 time_data is (onset, period) so the correct sampling rate would have been 1.0 / time_data[1]
| uniq_periods = np.unique(periods.round(decimals=4)) | ||
| if uniq_periods.size == 1: | ||
| # Uniformly sampled data | ||
| sampling_rate = 1.0 / uniq_periods.item() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was also a bug. we shouldn't have used the rounded-to-4-decimals value. Now we use the mean of per-period sfreq estimates
| mean_period = np.mean(np.diff(orig_time)) | ||
| acceptable_time_jitter[-1] += 0.0099 * mean_period |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just renaming variable for clarity
| assert "Found jitter of 0.9" in lines | ||
|
|
||
| # Add jitter of 1.01%, which is greater than allowed tolerance | ||
| # Add jitter of 1.02%, which is greater than allowed tolerance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.01% no longer triggered the error, possibly because float imprecision meant that the final period was a bit shorter than expected already.
|
all green! ready for review. Note that I put in one not-yet-merged suggestion through the GitHub UI, to change the docstring in a new test. |
larsoner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a couple minor docstring tweaks. I'll commit those and yours with a [ci skip] and merge, thanks @drammock !
Co-authored-by: Daniel McCloy <dan@mccloy.info>
|
... decided just to let CIs work their magic and mark for merge-when-green instead |
* upstream/main: (40 commits) fix typo (missing space) that messed up rst rendering (mne-tools#13217) MAINT: Restore VTK dev (mne-tools#13214) [pre-commit.ci] pre-commit autoupdate (mne-tools#13212) BUG: Fix bug with example (mne-tools#13210) MAINT: Fix pip-pre with PyVista (mne-tools#13207) Move FCBG to former partners (mne-tools#13205) ENH: Update related software list (mne-tools#13202) fix sfreq estimation for snirf files (mne-tools#13184) ENH: Use data-based padding instead of "odd" padding when filtering in raw.plot (mne-tools#13183) FIX: Bumps (mne-tools#13198) DOC: fix typo in examples/io/read_impedances.py (mne-tools#13197) [pre-commit.ci] pre-commit autoupdate (mne-tools#13173) FIX make_watershed_bem to handle missing talairach_with_skull.lta courtesy Freesurfer 8 (mne-tools#13172) ENH: Add upsampling for MEG helmet surface (mne-tools#13179) MAINT: Update code credit (mne-tools#13180) BUG: Fix bug with least-squares sphere fit (mne-tools#13178) fix EDF export (mne-tools#13174) fix typo (mne-tools#13171) [pre-commit.ci] pre-commit autoupdate (mne-tools#13164) Fix dev installation guide (mne-tools#13163) ...

closes #12508
@kdarti I think this will fix the sampling rate estimation problem for snirf files. Can you provide the file that was showing the problem for you, so I can test it / write a test that emulates that case?