Skip to content

Conversation

@upsonp
Copy link
Contributor

@upsonp upsonp commented May 19, 2022

No description provided.


name = _basename(fname)[1]
if name is None:
name = _basename(fname)[1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Forces folks using stream to always pass a name but that is optional, so the API here is kind of broken. (At least lacking documentation.)

Ideally we should check if it is stream and hardcode a default name for those cases.

For that to work we should move this check to _read_file and make it output both f and name there. That way we check for StringIO only once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope I'm not over complicating this idea, but did you want _read_file to return a tuple of values, the first being the StringIO object and the second being the file/stream name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. _read_file is an internal use function, so it is OK to "break it" and better than changing the call on from_btl. Does that make sense?

Copy link
Contributor Author

@upsonp upsonp May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an idea.

What if when reading the file, we grab the BTL "File Name" in the header section and use that to set the file name?

* Sea-Bird SBE 9 Data File:
* FileName = C:\CTD_ACQUISITION\2021185HUD\ctddata\185A007.hdr
* Software Version Seasave V 7.26.7.121
* Temperature SN = 5083
* Conductivity SN = 3562
 ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check if that metadata exists and, if not, fill with a placeholder. But sure, that is probably the best way to go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, what I mean is what if we got the file name in the _parse_seabird() method, if a name hasn't been set already.

@@ -62,7 +62,7 @@ def btl_duplicate_header_name():
def btl_as_stream():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not need to be a fixture b/c we won't be calling it multiple times in tests.
You can remove the @pytest.fixture and rename it to test_btl_as_stream so pytest can execute it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm not use to unittesting specifically with pytest I was just emulating what other tests had done, I'll fix it up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry and no need to be sorry. The only reason I'm not sending commit directly to your branch is b/c I believe you want to do this. If not, I can take over. Your contributions and very welcomed and I'm trying to pay back by showing a bit how my messy tests work.

@upsonp
Copy link
Contributor Author

upsonp commented May 24, 2022

@ocefpaf let me know what you think of this update.

I'm still mauling it over in my head. I'd prefer not changing default/existing behaviour so as not to accidentally break other people's code.

In this version, as discussed, I'm grabbing the name of the file from 'FileName' in the _parse_seabird() function. if the metadata['name'] value isn't set there then I default to the old behaviour.

I'm making the assumption here that the names of all the files are the same ,e.g 185A007.BTL, 185A007.ROS, 185A007.hex, etc., so the stem of the name given by 'FileName' will be the same for all files

Co-authored-by: Filipe <ocefpaf@gmail.com>
Comment on lines 88 to 99
def test_btl_with_dup_cols(btl_as_stream):
assert all(col in btl_as_stream.columns for col in ["Bottle", "Bottle_"])


def test_btl_as_stringio(btl_as_stream):
assert isinstance(btl_as_stream, pd.DataFrame)
assert not btl_as_stream.empty


def test_btl_as_stringio_without_name(btl_as_stream):
assert isinstance(btl_as_stream, pd.DataFrame)
assert not btl_as_stream.empty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btl_as_stream is from a stream and without a name. so you don't need the test_btl_as_stringio and test_btl_as_stringio_without_name, only one would suffice.

@ocefpaf ocefpaf merged commit 628a458 into pyoceans:main May 31, 2022
@ocefpaf ocefpaf mentioned this pull request May 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants