fix sfreq estimation for snirf files #13184

drammock · 2025-04-03T15:43:11Z

@kdarti I think this will fix the sampling rate estimation problem for snirf files. Can you provide the file that was showing the problem for you, so I can test it / write a test that emulates that case?

kdarti · 2025-04-04T08:27:37Z

closes #12508

@kdarti I think this will fix the sampling rate estimation problem for snirf files. Can you provide the file that was showing the problem for you, so I can test it / write a test that emulates that case?

Thanks for the fix, I attached the file to the issue, here's the link again: https://github.com/mne-tools/mne-python/files/14665643/WR11.Baseline.SNIRF.zip

drammock · 2025-04-04T18:25:15Z

Thanks @kdarti (and sorry I missed the fact that you'd already provided it!)

In fact the changes I've made so far will not address your problem, as the times in your file don't even yield a single estimate of period length when rounding to 4 decimals (so this file triggers the else branch, which I haven't modified here). I poked around with the times in that file and here's what I find:

import matplotlib.pyplot as plt
import numpy as np
from h5py import File
from tabulate import tabulate

with File("WR11 Baseline SNIRF.snirf", "r") as fid:
    times = np.array(fid["nirs/data1/time"])

periods = np.diff(times)
sfreqs = 1 / periods
uniq_sfreq, counts = np.unique(sfreqs, return_counts=True)
print(tabulate(list(zip(uniq_sfreq, counts)), headers=("sfreq", "count"), floatfmt=(".6f",)))
#     sfreq    count
# ---------  -------
# 49.951218    24433
# 49.989323     6016
# 49.998856     2232
# 49.999451        8
# 49.999748        8
# 49.999897        8
# 49.999973        9
# 49.999992        2
# 50.000000        4
# 50.000011        6
# 50.000050      347
# 50.001240      192
# 50.003624      192
# 50.008392     1984
# 50.027481     9984
# 50.103977     8192
# 50.257668      460

sfreqs.mean().item()  # mean
# 50.00006866455078
np.median(sfreqs).item()  # median
# 49.989322662353516
uniq_sfreq[np.argmax(counts)].item()  # mode
# 49.95121765136719

x = np.arange(len(sfreqs))
fig, ax = plt.subplots(layout="constrained")
ax.hlines(50, 0, len(sfreqs), colors='r', linestyles="dotted")
ax.scatter(x, sfreqs, linewidths=0, s=4)

Put simply: any sane way I can think of to estimate the sfreq based on this vector of times is going to yield something that is not exactly 50 Hz.

larsoner · 2025-04-04T18:26:52Z

Should we add a sfreq=None which is the current (or this improved) behavior, and for weird files like this people can pass sfreq=50. for example?

drammock · 2025-04-04T19:29:52Z

Should we add a sfreq=None which is the current (or this improved) behavior, and for weird files like this people can pass sfreq=50. for example?

How would that work exactly? Would we take their passed-in value and just fake the time vector as np.arange(len(times_in_the_snirf_file)) / passed_in_sfreq? That seems error-prone to me (as in, if the user was wrong about the sfreq, or had a typo, their results could look very different).

I guess I also don't see what the problem is with our estimate of the sfreq being off by a tiny fraction of a Hz from the hardware's nominal sampling frequency. To be clear: I think the change in this PR is a good one, but it turns out to be unrelated to the problem reported in #12508, and it's not clear to me what (if anything) should be done about #12508.

larsoner · 2025-04-04T20:18:06Z

Would we take their passed-in value and just fake the time vector as...

Yep. And we could still do the calculations and if it's off by rtol=0.01 or something at least emit a warning (which it wouldn't for the dataset above). I don't think it would be overly error prone since people would need to set it explicitly, and we could emit a warning if it's wrong. If we wanted to be really careful we could have a on_mismatch="raise" but since they have to pass a value explicitly to activate that functionality it seems a bit overkill.

I guess I also don't see what the problem is with our estimate of the sfreq being off by a tiny fraction of a Hz from the hardware's nominal sampling frequency

I think that plotting and time picking etc. all get a bit ugly/worse for the end user. Instead of having nicely formatted values you'll have ugly/long ones like we have for sample. Not a deal breaker or anything, just slightly less nice an experience.

If the data values are were unevenly sampled, we could consider allowing for a linear interpolation back to a grid. It would probably be good enough assuming the data are sufficiently lowpassed. But given how small the deviations are above, I doubt it would change anything.

drammock · 2025-04-04T20:30:55Z

ok that makes sense. Will adapt this PR accordingly

kdarti · 2025-04-07T08:11:16Z

Should we add a sfreq=None which is the current (or this improved) behavior, and for weird files like this people can pass sfreq=50. for example?

Again, I'd like to note that there is nothing weird with the file, all values in /nirs/data1/time have 2 decimals and values increment by 0.02. Just look at the weird values in periods and you can see that there are no weird values in time at those points. As mentioned in the issue this is about float errors.

Unless people specify the time vector using only start and end time (which is the other allowed option), this file is nothing out of the ordinatry. Note that this isn't super important to fix as the deviation from the true sample rate as specified by the /nirs/data1/time in the snirf file is small, but it is still wrong. With that said, it could be solved with fractions.

drammock · 2025-04-07T14:38:09Z

Again, I'd like to note that there is nothing weird with the file, all values in /nirs/data1/time have 2 decimals and values increment by 0.02.

@kdarti can you show me how you came to this conclusion? As far as I can tell, the data in the file are float32 values and are not exactly incrementing by 0.02:

from h5py import File

with File("WR11 Baseline SNIRF.snirf", "r") as fid:
    print(fid["nirs/data1/time"].dtype)
    times = fid["nirs/data1/time"][:]
    print("\nraw times:")
    print(times)
    print("\nunique period lengths:")
    with np.printoptions(precision=12):
        print(np.unique(times[1:] - times[:-1]))
# float32
# 
# raw times:
# [0.00000e+00 2.00000e-02 4.00000e-02 ... 1.08150e+03 1.08152e+03
#  1.08154e+03]
# 
# unique period lengths:
# [0.019897461 0.019958496 0.019989014 0.019996643 0.01999855  0.019999504
#  0.01999998  0.019999996 0.02        0.020000003 0.02000001  0.02000004
#  0.0200001   0.02000022  0.020000458 0.020004272 0.020019531]

edited code sample to make it clearer

drammock · 2025-04-07T15:44:16Z

Put another way: there's little we can do to work around how the values are stored in the HDF5 file. They're stored as float32 values, there's some inherent imprecision:

np.float32(0.02).as_integer_ratio()
# (5368709, 268435456)   # <---- notice it's not (1, 50)

np.float32(0.02).astype(np.float64)
# np.float64(0.019999999552965164)

np.float32(0.02).data.tolist()  # this upcasts to native python `float` (64-bit)
# 0.019999999552965164

kdarti · 2025-04-09T09:46:20Z

Again, I'd like to note that there is nothing weird with the file, all values in /nirs/data1/time have 2 decimals and values increment by 0.02.

@kdarti can you show me how you came to this conclusion? As far as I can tell, the data in the file are float32 values and are not exactly incrementing by 0.02:

from h5py import File

with File("WR11 Baseline SNIRF.snirf", "r") as fid:
    print(fid["nirs/data1/time"].dtype)
    times = fid["nirs/data1/time"][:]
    print("\nraw times:")
    print(times)
    print("\nunique period lengths:")
    with np.printoptions(precision=12):
        print(np.unique(times[1:] - times[:-1]))
# float32
# 
# raw times:
# [0.00000e+00 2.00000e-02 4.00000e-02 ... 1.08150e+03 1.08152e+03
#  1.08154e+03]
# 
# unique period lengths:
# [0.019897461 0.019958496 0.019989014 0.019996643 0.01999855  0.019999504
#  0.01999998  0.019999996 0.02        0.020000003 0.02000001  0.02000004
#  0.0200001   0.02000022  0.020000458 0.020004272 0.020019531]

edited code sample to make it clearer

My bad, was in a bit too much of a rush. So with 32 bit float this will always be the case then, since it's allowed in snirf spec but it'll still pass the jitter/variable sample rate check it's not really a problem then.

drammock · 2025-04-09T17:01:55Z

mne/io/snirf/_snirf.py

-        # specified as onset, samplerate
-        sampling_rate = 1.0 / (time_data[1] - time_data[0])


this was a bug. length-2 time_data is (onset, period) so the correct sampling rate would have been 1.0 / time_data[1]

drammock · 2025-04-09T17:03:43Z

mne/io/snirf/_snirf.py

-        uniq_periods = np.unique(periods.round(decimals=4))
-        if uniq_periods.size == 1:
-            # Uniformly sampled data
-            sampling_rate = 1.0 / uniq_periods.item()


this was also a bug. we shouldn't have used the rounded-to-4-decimals value. Now we use the mean of per-period sfreq estimates

drammock · 2025-04-09T17:05:35Z

mne/io/snirf/tests/test_snirf.py

+        mean_period = np.mean(np.diff(orig_time))
+        acceptable_time_jitter[-1] += 0.0099 * mean_period


just renaming variable for clarity

drammock · 2025-04-09T17:06:50Z

mne/io/snirf/tests/test_snirf.py

    assert "Found jitter of 0.9" in lines

-    # Add jitter of 1.01%, which is greater than allowed tolerance
+    # Add jitter of 1.02%, which is greater than allowed tolerance


1.01% no longer triggered the error, possibly because float imprecision meant that the final period was a bit shorter than expected already.

mne/io/snirf/tests/test_snirf.py

drammock · 2025-04-09T18:48:05Z

all green! ready for review. Note that I put in one not-yet-merged suggestion through the GitHub UI, to change the docstring in a new test.

larsoner

Looks good, just a couple minor docstring tweaks. I'll commit those and yours with a [ci skip] and merge, thanks @drammock !

mne/io/snirf/_snirf.py

Co-authored-by: Daniel McCloy <dan@mccloy.info>

larsoner · 2025-04-09T19:20:53Z

... decided just to let CIs work their magic and mark for merge-when-green instead

* upstream/main: (40 commits) fix typo (missing space) that messed up rst rendering (mne-tools#13217) MAINT: Restore VTK dev (mne-tools#13214) [pre-commit.ci] pre-commit autoupdate (mne-tools#13212) BUG: Fix bug with example (mne-tools#13210) MAINT: Fix pip-pre with PyVista (mne-tools#13207) Move FCBG to former partners (mne-tools#13205) ENH: Update related software list (mne-tools#13202) fix sfreq estimation for snirf files (mne-tools#13184) ENH: Use data-based padding instead of "odd" padding when filtering in raw.plot (mne-tools#13183) FIX: Bumps (mne-tools#13198) DOC: fix typo in examples/io/read_impedances.py (mne-tools#13197) [pre-commit.ci] pre-commit autoupdate (mne-tools#13173) FIX make_watershed_bem to handle missing talairach_with_skull.lta courtesy Freesurfer 8 (mne-tools#13172) ENH: Add upsampling for MEG helmet surface (mne-tools#13179) MAINT: Update code credit (mne-tools#13180) BUG: Fix bug with least-squares sphere fit (mne-tools#13178) fix EDF export (mne-tools#13174) fix typo (mne-tools#13171) [pre-commit.ci] pre-commit autoupdate (mne-tools#13164) Fix dev installation guide (mne-tools#13163) ...

fix sfreq estimation [ci skip]

8868a58

drammock changed the title ~~fix sfreq estimation [ci skip]~~ fix sfreq estimation for snirf files Apr 3, 2025

drammock mentioned this pull request Apr 7, 2025

Improve SNIRF Sampling Rate Estimation by Replacing Rounding with Pre… #13195

Closed

drammock added 3 commits April 9, 2025 11:54

refactor sfreq and jitter computations

285d9b8

tweak jitter test

3c70f24

changelog

a64b80b

drammock marked this pull request as ready for review April 9, 2025 16:59

drammock requested review from agramfort, dengemann, larsoner and rob-luke as code owners April 9, 2025 16:59

drammock commented Apr 9, 2025

View reviewed changes

larsoner approved these changes Apr 9, 2025

View reviewed changes

mne/io/snirf/_snirf.py Show resolved Hide resolved

mne/io/snirf/_snirf.py Show resolved Hide resolved

larsoner and others added 2 commits April 9, 2025 15:19

Apply suggestions from code review [ci skip]

ef34df9

Co-authored-by: Daniel McCloy <dan@mccloy.info>

Merge branch 'main' into snirf

bf333d2

larsoner enabled auto-merge (squash) April 9, 2025 19:20

larsoner merged commit 729bdf3 into mne-tools:main Apr 9, 2025
30 checks passed

drammock deleted the snirf branch April 9, 2025 21:32

This was referenced Apr 18, 2025

ENH: Improve robustness in extracting sfreq #13120

Merged

Unexpected Indexing Error Due to Low-Precision sampling_rate Calculation #13117

Closed

		# specified as onset, samplerate
		sampling_rate = 1.0 / (time_data[1] - time_data[0])

		mean_period = np.mean(np.diff(orig_time))
		acceptable_time_jitter[-1] += 0.0099 * mean_period

Uh oh!

fix sfreq estimation for snirf files #13184

fix sfreq estimation for snirf files #13184

Uh oh!

Conversation

drammock commented Apr 3, 2025

Uh oh!

kdarti commented Apr 4, 2025

Uh oh!

drammock commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

larsoner commented Apr 4, 2025

Uh oh!

drammock commented Apr 4, 2025

Uh oh!

larsoner commented Apr 4, 2025

Uh oh!

drammock commented Apr 4, 2025

Uh oh!

kdarti commented Apr 7, 2025

Uh oh!

drammock commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drammock commented Apr 7, 2025

Uh oh!

kdarti commented Apr 9, 2025

Uh oh!

drammock Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

drammock Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

drammock Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

drammock Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

drammock commented Apr 9, 2025

Uh oh!

larsoner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

larsoner commented Apr 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

drammock commented Apr 4, 2025 •

edited

Loading

drammock commented Apr 7, 2025 •

edited

Loading