Skip to content

Conversation

@sappelhoff
Copy link
Member

@sappelhoff sappelhoff commented Sep 10, 2024

I ran into an issue when I tried to export a Raw object to EEGLAB and it expected raw.filenames[0] to be a str, but it was a pathlib.Path.

With the little fix here, such surprises will be minimized, at no real cost for mne, IMHO.

Furthermore, we do not specify in the API, what raw.filenames SHOULD be in terms of a datatype:

https://mne.tools/stable/generated/mne.io.Raw.html#mne.io.Raw.filenames

Can somebody tell me the difference between .filenames and ._filenames? For example @cbrnr you are using the private attribute here: https://github.com/cbrnr/mnelab/blob/f5f641ba4376e53f1d8687692b592aecb9beea8d/src/mnelab/io/xdf.py#L154

What is your reasoning for that over using the public one?

@cbrnr
Copy link
Contributor

cbrnr commented Sep 10, 2024

.filenames is a property without a setter, so I'm directly using the underlying ._filenames because I need to set it.

@cbrnr
Copy link
Contributor

cbrnr commented Sep 10, 2024

Do we need raw.filenames to be a list of Path, or could we live with forcing it to be a list of str? Then we would not have to cast to str everywhere (I'm assuming this will crop up in other readers as well).

@mscheltienne
Copy link
Member

mscheltienne commented Sep 10, 2024

IMO it's the other way around, we should try to support Paths instead of str.

Edit: Newer readers like the EyeLink or ANT explicitly cast to Path or call _check_fname which cast to Path before filling the filenames property.

@sappelhoff
Copy link
Member Author

.filenames is a property without a setter, so I'm directly using the underlying ._filenames because I need to set it.

Ahh, thanks, found it:

mne-python/mne/io/base.py

Lines 677 to 680 in f3a3ca4

@property
def filenames(self):
"""The filenames used."""
return tuple(self._filenames)

@cbrnr
Copy link
Contributor

cbrnr commented Sep 10, 2024

IMO it's the other way around, we should try to support Paths instead of str.

Depends on how this property is used. I was thinking that it's just a list of filenames (usually there will be only one filename). I don't think that this is supposed to store the entire path, so I think a str would be the better choice. But correct me if I'm wrong if this attribute serves a different purpose.

@sappelhoff
Copy link
Member Author

I think a tuple of Paths would be good, with additional documentation under:

mne-python/mne/io/base.py

Lines 677 to 680 in f3a3ca4

@property
def filenames(self):
"""The filenames used."""
return tuple(self._filenames)

That documentation could include that obtaining a str of the filename can be as easy as raw.filenames[idx].name.

I think this would be more powerful than just having a tuple of str.

@cbrnr
Copy link
Contributor

cbrnr commented Sep 10, 2024

I'd still like to know what purpose this attribute serves. As far as I can see, we don't really need it except to print the original file names in a report? So if this is not intended to be a path, why store it as a path?

In any case, I'm not going to argue for it anymore. If you think a Path is better, please go ahead.

Can you also add type information in the class docstring (in addition to the property)?

@sappelhoff
Copy link
Member Author

So if this is not intended to be a path, why store it as a path?

I sometimes use it in personal scripts of mine, to get info that I might store as part of a file path. There are other solutions I could use, of course ... but sometimes this is convenient.

In any case, I'm not going to argue for it anymore. If you think a Path is better, please go ahead.

I also don't have a strong opinion. Happy to follow the majority.

@mscheltienne
Copy link
Member

IIRC, the purpose is to store the files needed to load data from when preload=False. In that sense, a str would be sufficient and you won't run into Windows/Unix issues. I don't know if it supports both relative and absolute paths, absolute would be safer I guess.
IMO, Path are a better solution to handle file or directory names and should be preferred. That said, this one will need a bit of work, I can look into it today if you want. We could cast to Path here:

self._filenames = list(filenames)

Which should convert all our reader code at once. However, we also need to track down lines like this one:

if ("STI 014" in ch_names) and not (self.filenames[0].endswith(".fif")):

Which needs to be converted..

@sappelhoff
Copy link
Member Author

sappelhoff commented Sep 10, 2024

I can look into it today if you want.

That would be much appreciated 🙏

Your suggested changes sound good to me. I would still also improve the documentation of the attribute in the API docs, and perhaps also add a check to the property (raw.filenames) itself to return a tuple of Path ... for cases where people create Raws from scratch and set _filenames themselves with a str.

@cbrnr
Copy link
Contributor

cbrnr commented Sep 10, 2024

IIRC, the purpose is to store the files needed to load data from when preload=False.

Can you point me to the corresponding code? I didn't find this when grepping for .filenames.

I think we should first establish what purpose this attribute serves, and only then decide what the best representation is. If it is merely for displaying the original file name (this is my best guess at the moment), then I'd go for a string, which is easier and we don't want/need any path-related info then (just the file name).

@cbrnr
Copy link
Contributor

cbrnr commented Sep 10, 2024

It seems like most readers already use pathlib.Path to fill this list, so I think it is indeed less work to (1) document and (2) enforce this type.

@mscheltienne
Copy link
Member

mscheltienne commented Sep 10, 2024

Here for the FIFF format:

def _read_segment_file(self, data, idx, fi, start, stop, cals, mult):
"""Read a segment of data from a file."""
n_bad = 0
with _fiff_get_fid(self._filenames[fi]) as fid:

The method _read_segment_file differs between readers, but I think always with the same signature where fi is the index of the file within self._filenames from which the segment is read.

@cbrnr
Copy link
Contributor

cbrnr commented Sep 10, 2024

Everything is using the private ._filenames though. Why do we have .filenames then? It looks like the least invasive change would be to simply cast all items to pathlib.Path in the property then and leave ._filenames as is. WDYT?

@mscheltienne
Copy link
Member

-1 for this since some readers are setting ._filenames as list of str, and some are setting it as list of Path. I'd prefer to get this variable consistent across readers by casting it to one or the other.

@cbrnr
Copy link
Contributor

cbrnr commented Sep 10, 2024

What about adding a setter that takes care of type conversion? Then we could use the public attribute instead of fiddling with the private one.

@mscheltienne
Copy link
Member

mscheltienne commented Sep 10, 2024

I agree that would be cleaner, and removing the private attribute will let the unit test tell us where we need to now use the public property. I suspect that will turn out into a massive change.. while casting here where we set this private attribute will yield a smaller diff:

self._filenames = list(filenames)

I'll give it a shot, le's see how much it is.

@hoechenberger
Copy link
Member

hoechenberger commented Sep 10, 2024

Why don't you use Path.name.endswith() or Path.as_posix().endswith()?

@sappelhoff
Copy link
Member Author

Why don't you use Path.name.endswith() or Path.as_posix().endswith()?

For me it's more about wanting a consistent and documented filenames attribute that is either always a tuple of str, or always a tuple of pathlib.Path (I prefer the latter).

The issue here only arises because it is sometimes one, and sometimes the other ... in particular in user-code or 3rd library code where people may set ._filenames themselves.

@mscheltienne
Copy link
Member

I'm almost done with a draft PR using Path, it wasn't that bad, just a couple more tests to fix where the filename used to load a file is actually a StringIO or BytesIO or other similar IO object, in which case there value in filenames can end up being the string File-like or the repr(...) of the IO object, neither of which is informative.

@hoechenberger
Copy link
Member

The issue here only arises because it is sometimes one, and sometimes the other

ah got you!

@sappelhoff
Copy link
Member Author

@sappelhoff sappelhoff closed this Sep 12, 2024
@sappelhoff sappelhoff deleted the fpath branch September 12, 2024 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants