Skip to content

Conversation

@drammock
Copy link
Member

@drammock drammock commented Apr 3, 2025

closes #12508

@kdarti I think this will fix the sampling rate estimation problem for snirf files. Can you provide the file that was showing the problem for you, so I can test it / write a test that emulates that case?

@drammock drammock changed the title fix sfreq estimation [ci skip] fix sfreq estimation for snirf files Apr 3, 2025
@kdarti
Copy link

kdarti commented Apr 4, 2025

closes #12508

@kdarti I think this will fix the sampling rate estimation problem for snirf files. Can you provide the file that was showing the problem for you, so I can test it / write a test that emulates that case?

Thanks for the fix, I attached the file to the issue, here's the link again: https://github.com/mne-tools/mne-python/files/14665643/WR11.Baseline.SNIRF.zip

@drammock
Copy link
Member Author

drammock commented Apr 4, 2025

Thanks @kdarti (and sorry I missed the fact that you'd already provided it!)

In fact the changes I've made so far will not address your problem, as the times in your file don't even yield a single estimate of period length when rounding to 4 decimals (so this file triggers the else branch, which I haven't modified here). I poked around with the times in that file and here's what I find:

import matplotlib.pyplot as plt
import numpy as np
from h5py import File
from tabulate import tabulate

with File("WR11 Baseline SNIRF.snirf", "r") as fid:
    times = np.array(fid["nirs/data1/time"])

periods = np.diff(times)
sfreqs = 1 / periods
uniq_sfreq, counts = np.unique(sfreqs, return_counts=True)
print(tabulate(list(zip(uniq_sfreq, counts)), headers=("sfreq", "count"), floatfmt=(".6f",)))
#     sfreq    count
# ---------  -------
# 49.951218    24433
# 49.989323     6016
# 49.998856     2232
# 49.999451        8
# 49.999748        8
# 49.999897        8
# 49.999973        9
# 49.999992        2
# 50.000000        4
# 50.000011        6
# 50.000050      347
# 50.001240      192
# 50.003624      192
# 50.008392     1984
# 50.027481     9984
# 50.103977     8192
# 50.257668      460

sfreqs.mean().item()  # mean
# 50.00006866455078
np.median(sfreqs).item()  # median
# 49.989322662353516
uniq_sfreq[np.argmax(counts)].item()  # mode
# 49.95121765136719

x = np.arange(len(sfreqs))
fig, ax = plt.subplots(layout="constrained")
ax.hlines(50, 0, len(sfreqs), colors='r', linestyles="dotted")
ax.scatter(x, sfreqs, linewidths=0, s=4)

Figure_1

Put simply: any sane way I can think of to estimate the sfreq based on this vector of times is going to yield something that is not exactly 50 Hz.

@larsoner
Copy link
Member

larsoner commented Apr 4, 2025

Should we add a sfreq=None which is the current (or this improved) behavior, and for weird files like this people can pass sfreq=50. for example?

@drammock
Copy link
Member Author

drammock commented Apr 4, 2025

Should we add a sfreq=None which is the current (or this improved) behavior, and for weird files like this people can pass sfreq=50. for example?

How would that work exactly? Would we take their passed-in value and just fake the time vector as np.arange(len(times_in_the_snirf_file)) / passed_in_sfreq? That seems error-prone to me (as in, if the user was wrong about the sfreq, or had a typo, their results could look very different).

I guess I also don't see what the problem is with our estimate of the sfreq being off by a tiny fraction of a Hz from the hardware's nominal sampling frequency. To be clear: I think the change in this PR is a good one, but it turns out to be unrelated to the problem reported in #12508, and it's not clear to me what (if anything) should be done about #12508.

@larsoner
Copy link
Member

larsoner commented Apr 4, 2025

Would we take their passed-in value and just fake the time vector as...

Yep. And we could still do the calculations and if it's off by rtol=0.01 or something at least emit a warning (which it wouldn't for the dataset above). I don't think it would be overly error prone since people would need to set it explicitly, and we could emit a warning if it's wrong. If we wanted to be really careful we could have a on_mismatch="raise" but since they have to pass a value explicitly to activate that functionality it seems a bit overkill.

I guess I also don't see what the problem is with our estimate of the sfreq being off by a tiny fraction of a Hz from the hardware's nominal sampling frequency

I think that plotting and time picking etc. all get a bit ugly/worse for the end user. Instead of having nicely formatted values you'll have ugly/long ones like we have for sample. Not a deal breaker or anything, just slightly less nice an experience.

If the data values are were unevenly sampled, we could consider allowing for a linear interpolation back to a grid. It would probably be good enough assuming the data are sufficiently lowpassed. But given how small the deviations are above, I doubt it would change anything.

@drammock
Copy link
Member Author

drammock commented Apr 4, 2025

ok that makes sense. Will adapt this PR accordingly

@kdarti
Copy link

kdarti commented Apr 7, 2025

Should we add a sfreq=None which is the current (or this improved) behavior, and for weird files like this people can pass sfreq=50. for example?

Again, I'd like to note that there is nothing weird with the file, all values in /nirs/data1/time have 2 decimals and values increment by 0.02. Just look at the weird values in periods and you can see that there are no weird values in time at those points. As mentioned in the issue this is about float errors.

Unless people specify the time vector using only start and end time (which is the other allowed option), this file is nothing out of the ordinatry. Note that this isn't super important to fix as the deviation from the true sample rate as specified by the /nirs/data1/time in the snirf file is small, but it is still wrong. With that said, it could be solved with fractions.

@drammock
Copy link
Member Author

drammock commented Apr 7, 2025

Again, I'd like to note that there is nothing weird with the file, all values in /nirs/data1/time have 2 decimals and values increment by 0.02.

@kdarti can you show me how you came to this conclusion? As far as I can tell, the data in the file are float32 values and are not exactly incrementing by 0.02:

from h5py import File

with File("WR11 Baseline SNIRF.snirf", "r") as fid:
    print(fid["nirs/data1/time"].dtype)
    times = fid["nirs/data1/time"][:]
    print("\nraw times:")
    print(times)
    print("\nunique period lengths:")
    with np.printoptions(precision=12):
        print(np.unique(times[1:] - times[:-1]))
# float32
# 
# raw times:
# [0.00000e+00 2.00000e-02 4.00000e-02 ... 1.08150e+03 1.08152e+03
#  1.08154e+03]
# 
# unique period lengths:
# [0.019897461 0.019958496 0.019989014 0.019996643 0.01999855  0.019999504
#  0.01999998  0.019999996 0.02        0.020000003 0.02000001  0.02000004
#  0.0200001   0.02000022  0.020000458 0.020004272 0.020019531]

edited code sample to make it clearer

@drammock
Copy link
Member Author

drammock commented Apr 7, 2025

Put another way: there's little we can do to work around how the values are stored in the HDF5 file. They're stored as float32 values, there's some inherent imprecision:

np.float32(0.02).as_integer_ratio()
# (5368709, 268435456)   # <---- notice it's not (1, 50)

np.float32(0.02).astype(np.float64)
# np.float64(0.019999999552965164)

np.float32(0.02).data.tolist()  # this upcasts to native python `float` (64-bit)
# 0.019999999552965164

@kdarti
Copy link

kdarti commented Apr 9, 2025

Again, I'd like to note that there is nothing weird with the file, all values in /nirs/data1/time have 2 decimals and values increment by 0.02.

@kdarti can you show me how you came to this conclusion? As far as I can tell, the data in the file are float32 values and are not exactly incrementing by 0.02:

from h5py import File

with File("WR11 Baseline SNIRF.snirf", "r") as fid:
    print(fid["nirs/data1/time"].dtype)
    times = fid["nirs/data1/time"][:]
    print("\nraw times:")
    print(times)
    print("\nunique period lengths:")
    with np.printoptions(precision=12):
        print(np.unique(times[1:] - times[:-1]))
# float32
# 
# raw times:
# [0.00000e+00 2.00000e-02 4.00000e-02 ... 1.08150e+03 1.08152e+03
#  1.08154e+03]
# 
# unique period lengths:
# [0.019897461 0.019958496 0.019989014 0.019996643 0.01999855  0.019999504
#  0.01999998  0.019999996 0.02        0.020000003 0.02000001  0.02000004
#  0.0200001   0.02000022  0.020000458 0.020004272 0.020019531]

edited code sample to make it clearer

My bad, was in a bit too much of a rush. So with 32 bit float this will always be the case then, since it's allowed in snirf spec but it'll still pass the jitter/variable sample rate check it's not really a problem then.

@drammock drammock marked this pull request as ready for review April 9, 2025 16:59
Comment on lines -544 to -545
# specified as onset, samplerate
sampling_rate = 1.0 / (time_data[1] - time_data[0])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a bug. length-2 time_data is (onset, period) so the correct sampling rate would have been 1.0 / time_data[1]

uniq_periods = np.unique(periods.round(decimals=4))
if uniq_periods.size == 1:
# Uniformly sampled data
sampling_rate = 1.0 / uniq_periods.item()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was also a bug. we shouldn't have used the rounded-to-4-decimals value. Now we use the mean of per-period sfreq estimates

Comment on lines +556 to +557
mean_period = np.mean(np.diff(orig_time))
acceptable_time_jitter[-1] += 0.0099 * mean_period
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just renaming variable for clarity

assert "Found jitter of 0.9" in lines

# Add jitter of 1.01%, which is greater than allowed tolerance
# Add jitter of 1.02%, which is greater than allowed tolerance
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.01% no longer triggered the error, possibly because float imprecision meant that the final period was a bit shorter than expected already.

@drammock
Copy link
Member Author

drammock commented Apr 9, 2025

all green! ready for review. Note that I put in one not-yet-merged suggestion through the GitHub UI, to change the docstring in a new test.

Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a couple minor docstring tweaks. I'll commit those and yours with a [ci skip] and merge, thanks @drammock !

larsoner and others added 2 commits April 9, 2025 15:19
@larsoner larsoner enabled auto-merge (squash) April 9, 2025 19:20
@larsoner
Copy link
Member

larsoner commented Apr 9, 2025

... decided just to let CIs work their magic and mark for merge-when-green instead

@larsoner larsoner merged commit 729bdf3 into mne-tools:main Apr 9, 2025
30 checks passed
@drammock drammock deleted the snirf branch April 9, 2025 21:32
larsoner added a commit to SYXiao2002/mne-python that referenced this pull request Apr 18, 2025
* upstream/main: (40 commits)
  fix typo (missing space) that messed up rst rendering (mne-tools#13217)
  MAINT: Restore VTK dev (mne-tools#13214)
  [pre-commit.ci] pre-commit autoupdate (mne-tools#13212)
  BUG: Fix bug with example (mne-tools#13210)
  MAINT: Fix pip-pre with PyVista (mne-tools#13207)
  Move FCBG to former partners (mne-tools#13205)
  ENH: Update related software list (mne-tools#13202)
  fix sfreq estimation for snirf files (mne-tools#13184)
  ENH: Use data-based padding instead of "odd" padding when filtering in raw.plot (mne-tools#13183)
  FIX: Bumps (mne-tools#13198)
  DOC: fix typo in examples/io/read_impedances.py (mne-tools#13197)
  [pre-commit.ci] pre-commit autoupdate (mne-tools#13173)
  FIX make_watershed_bem to handle missing talairach_with_skull.lta courtesy Freesurfer 8 (mne-tools#13172)
  ENH: Add upsampling for MEG helmet surface (mne-tools#13179)
  MAINT: Update code credit (mne-tools#13180)
  BUG: Fix bug with least-squares sphere fit (mne-tools#13178)
  fix EDF export (mne-tools#13174)
  fix typo (mne-tools#13171)
  [pre-commit.ci] pre-commit autoupdate (mne-tools#13164)
  Fix dev installation guide (mne-tools#13163)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

_extract_sampling_rate function for io.read_raw_snirf incorrectly estimates sample rate due to floating point error.

3 participants