Skip to content

Bug: read_raw_eyelink does not read all data #12480

@dominikwelke

Description

@dominikwelke

Description of the problem

great to see the progress on the eyetracking @scott-huberty !

i discovered a bug in the code that leads to ignoring about 50% of data in non-continuous/multi block recordings.
the root is the behavior + current parameter choice of pd.merge_asof() (used to fill in missing timesamples).

i'll post a PR to fix it

background (for others than Scott):
eyelink doesnt store sample numbers but time in ms if sampling rate is below 1000hz (usually the case) it is very likely that later recording blocks start at a millisecond count that does not match the initial one (e.g. with sfreq=500hz: sampling uneven ms while the initial block sampled even ms).
so to merge these blocks on a unified timescale the later block has to be shifted by half a sample or so.

pd.merge_asof is used to do that, but the current tolerance is too low to actually catch the cases that have to be shifted, and they are replaced by NaNs and lost
in the 500hz example, the current tolerance is only 0.2 ms, which is not enough to catch the offset of 1.0 ms

below a simpified example:

Steps to reproduce

# replicate bug
sfreq = 500
time_col = "time"

df = pd.DataFrame({
    time_col:[2,4,6,11,13,15,20,22,24],
    "data":[2,4,6,11,13,15,20,22,24]})


# mimic current _adjust_times function
first, last = df[time_col].iloc[[0, -1]]
step = 1000 / sfreq
df[time_col] = df[time_col].astype(float)
new_times = pd.DataFrame(
    np.arange(first, last + step / 2, step), columns=[time_col]
)
# critical line below
return_current = pd.merge_asof(
    new_times, df, on=time_col, direction="nearest", tolerance=step / 10
    )

print("current implementation:")
print(return_current)
print()


# fixed alternatives
return_new = pd.merge_asof(
    new_times, df, on=time_col, direction="nearest", tolerance=step / 2
    )
print("fixed (nearest):")
print(return_new)
print()

return_new = pd.merge_asof(
    new_times, df, on=time_col, direction="backward", tolerance=step / 2
    )
print("fixed (backwards):")
print(return_new)
print()

Link to data

No response

Expected results

time  data

0 2.0 2.0
1 4.0 4.0
2 6.0 6.0
3 8.0 NaN
4 10.0 NaN
5 12.0 11.0
6 14.0 13.0
7 16.0 15.0
8 18.0 NaN
9 20.0 20.0
10 22.0 22.0
11 24.0 24.0

Actual results

time  data

0 2.0 2.0
1 4.0 4.0
2 6.0 6.0
3 8.0 NaN
4 10.0 NaN
5 12.0 NaN
6 14.0 NaN
7 16.0 NaN
8 18.0 NaN
9 20.0 20.0
10 22.0 22.0
11 24.0 24.0

Additional information

doesnt matter

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions