-
Notifications
You must be signed in to change notification settings - Fork 36
added stream support to from_btl for issue #142 #146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| name = _basename(fname)[1] | ||
| if name is None: | ||
| name = _basename(fname)[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Forces folks using stream to always pass a name but that is optional, so the API here is kind of broken. (At least lacking documentation.)
Ideally we should check if it is stream and hardcode a default name for those cases.
For that to work we should move this check to _read_file and make it output both f and name there. That way we check for StringIO only once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope I'm not over complicating this idea, but did you want _read_file to return a tuple of values, the first being the StringIO object and the second being the file/stream name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. _read_file is an internal use function, so it is OK to "break it" and better than changing the call on from_btl. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an idea.
What if when reading the file, we grab the BTL "File Name" in the header section and use that to set the file name?
* Sea-Bird SBE 9 Data File:
* FileName = C:\CTD_ACQUISITION\2021185HUD\ctddata\185A007.hdr
* Software Version Seasave V 7.26.7.121
* Temperature SN = 5083
* Conductivity SN = 3562
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to check if that metadata exists and, if not, fill with a placeholder. But sure, that is probably the best way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, what I mean is what if we got the file name in the _parse_seabird() method, if a name hasn't been set already.
| @@ -62,7 +62,7 @@ def btl_duplicate_header_name(): | |||
| def btl_as_stream(): | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not need to be a fixture b/c we won't be calling it multiple times in tests.
You can remove the @pytest.fixture and rename it to test_btl_as_stream so pytest can execute it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm not use to unittesting specifically with pytest I was just emulating what other tests had done, I'll fix it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't worry and no need to be sorry. The only reason I'm not sending commit directly to your branch is b/c I believe you want to do this. If not, I can take over. Your contributions and very welcomed and I'm trying to pay back by showing a bit how my messy tests work.
|
@ocefpaf let me know what you think of this update. I'm still mauling it over in my head. I'd prefer not changing default/existing behaviour so as not to accidentally break other people's code. In this version, as discussed, I'm grabbing the name of the file from 'FileName' in the I'm making the assumption here that the names of all the files are the same ,e.g 185A007.BTL, 185A007.ROS, 185A007.hex, etc., so the stem of the name given by 'FileName' will be the same for all files |
Co-authored-by: Filipe <ocefpaf@gmail.com>
tests/test_read.py
Outdated
| def test_btl_with_dup_cols(btl_as_stream): | ||
| assert all(col in btl_as_stream.columns for col in ["Bottle", "Bottle_"]) | ||
|
|
||
|
|
||
| def test_btl_as_stringio(btl_as_stream): | ||
| assert isinstance(btl_as_stream, pd.DataFrame) | ||
| assert not btl_as_stream.empty | ||
|
|
||
|
|
||
| def test_btl_as_stringio_without_name(btl_as_stream): | ||
| assert isinstance(btl_as_stream, pd.DataFrame) | ||
| assert not btl_as_stream.empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btl_as_stream is from a stream and without a name. so you don't need the test_btl_as_stringio and test_btl_as_stringio_without_name, only one would suffice.
No description provided.