Skip to content

refactor(datafile): use len(obj) rather than obj.get_nrecords()#2215

Merged
wpbonelli merged 1 commit into
modflowpy:developfrom
mwtoews:fix-nrecords-to-len
Jun 11, 2024
Merged

refactor(datafile): use len(obj) rather than obj.get_nrecords()#2215
wpbonelli merged 1 commit into
modflowpy:developfrom
mwtoews:fix-nrecords-to-len

Conversation

@mwtoews
Copy link
Copy Markdown
Contributor

@mwtoews mwtoews commented Jun 7, 2024

This PR has a few aims related to data files, including FormattedHeadFile, HeadFile and CellBudgetFile.

These files have a "number of records" property that was implemented with get_nrecords(). This "length of the object" measure is more naturally done with __len__, i.e. len(headsfile).

It is advised to prefer the len() approach, so instances of get_nrecords() for these files show a DeprecationWarning.

The CellBudgetFile also has a .nrecords property. It is also advised to show a DeprecationWarning with this property.

This PR also fixes a bug with get_nrecords() shown here:

def get_nrecords(self):
if isinstance(self.recordarray, np.recarray):
return self.recordarray.shape[0]
return 0

the bug is that recordarray is a structured array (np.ndarray), not a record array, so this always silently returns 0. This bug does not apply to CellBudgetFile, which worked fine and matches obj.nrecords.

Fixing this bug caused this test to fail, since the for-loop was never activated. A "todo" note is added since the reversed header is not the same as the original header.

Another aim of this PR is to re-organize a few CellBudgetFile tests from test_binaryfile.py to test_cellbudgetfile.py. Most of this is copied with perhaps minor simplifications.

Note that none of these changes apply to flopy.utils.swroutputfile.SwrBudget.get_nrecords(), which returns a tuple.


There is also still room for discussion if get_nrecords() or a dynamic .nrecords property should be preferred over len(obj). This PR can be adjusted accordingly. Opinions welcome!

@christianlangevin
Copy link
Copy Markdown

I'm in favor of len(obj). Sadly, len doesn't correspond to the number of times the data is available, but rather the number of records, which are written by layer with DIS/DISV. Whatever we can do to make our objects as pythonic as possible seems to be a step in the right direction. The recordarray misnomer is something that has been propagated for years due to a misunderstanding on our (my) part regarding the differences between structured arrays and record arrays -- something to keep in mind with a needed refactoring.

@wpbonelli
Copy link
Copy Markdown
Member

Sadly, len doesn't correspond to the number of times the data is available, but rather the number of records, which are written by layer with DIS/DISV.

Agreed len() suggests to me the number of rows in tabular data, as in numpy and pandas. A couple thoughts on this (hopefully not too off-base — maybe I have misunderstood something)

  • Flopy's approach to model outputs is to provide classes wrapping MODFLOW/etc output files, which may contain several tables of different shapes. This could be seen as a leaky abstraction. Does flopy really need to reproduce MODFLOW output file structure in user-facing APIs?
  • Flopy uses "record" to mean "table of some kind of data", where it commonly means a single row of tabular data, so "number of records"/"length of the object" really means "number of separate tables in this output file"

Maybe a future flopy could consider static routines to read model outputs e.g.

flopy.read_binary_file("path/to/file.cbc", data="FLOW RIGHT FACE")

File classes could live on under the hood, e.g. for indexing/caching, precision detection, etc, but it seems ideal to try to hide this and return one table at a time? The user would still need to know which sort of data can be found in which output file, but even that seems like it could be hidden with suitable shortcuts on model.output.

In the meantime I like len(obj) as sugar for get_nrecords() though I wonder if there is some way to emphasize this is not a row (i.e. time) count

@mwtoews
Copy link
Copy Markdown
Contributor Author

mwtoews commented Jun 11, 2024

Reply to a few points of discussion...

Agreed len() suggests to me the number of rows in tabular data, as in numpy and pandas.

Xref to #2221 which adds a .headers data frame with this number of headers.

Maybe a future flopy could consider static routines to read model outputs e.g.

flopy.read_binary_file("path/to/file.cbc", data="FLOW RIGHT FACE")

With CellBudget files, a somewhat expensive component is to build an index of headers, which is ideally done once, then re-used to get data from more than one component. This process wouldn't work well with static routines.

In the meantime I like len(obj) as sugar for get_nrecords() though I wonder if there is some way to emphasize this is not a row (i.e. time) count

I was considering keeping/adding obj.nrecords to be the same as len(obj), but the docstrings for __len__ should be sufficient without adding another property.

@wpbonelli
Copy link
Copy Markdown
Member

@mwtoews can you resolve the datafile.py conflict? then we can bring this in

@mwtoews mwtoews force-pushed the fix-nrecords-to-len branch from 5025802 to 2e97b8e Compare June 11, 2024 21:31
@wpbonelli wpbonelli merged commit e2d16df into modflowpy:develop Jun 11, 2024
@mwtoews mwtoews deleted the fix-nrecords-to-len branch June 11, 2024 23:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants