Skip to content

Comments

Expand reader tests#346

Merged
cc-a merged 12 commits intodevelopfrom
reader-test-improvements
Jun 25, 2024
Merged

Expand reader tests#346
cc-a merged 12 commits intodevelopfrom
reader-test-improvements

Conversation

@cc-a
Copy link
Collaborator

@cc-a cc-a commented Jun 18, 2024

Description

As prepatory work for #332 this PR expands the tests for the reader functions in readers/csv.py to more strictly codify the expected output. So far I've only looked at tests/test_readers.py::test_read_trade_technodata and wanted to get initial feedback before working on others. Particularly interested on anything else about the structure of the dataset that could be checked.

Type of change

Please add a line in the relevant section of
CHANGELOG.md to
document the change (include PR #) - note reverse order of PR #s.

  • New feature (non-breaking change which adds functionality)
  • Optimization (non-breaking, back-end change that speeds up the code)
  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (whatever its nature)

Key checklist

  • All tests pass: $ python -m pytest
  • The documentation builds and looks OK: $ python -m sphinx -b html docs docs/build

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works

@cc-a cc-a requested review from alexdewar and tsmbland June 18, 2024 15:27
Copy link
Collaborator

@alexdewar alexdewar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sensible!

The only problem is that the tests are all failing because of that numpy issue 😞.

@tsmbland Is #312 ready to merge or should we open a separate PR to pin the numpy version in the meantime?

@tsmbland
Copy link
Collaborator

This looks good! I don't think there's anything else that needs to be checked

I've realised I don't actually know what the TradeTechnodata.csv file is for as it isn't included in the default example which is what I've been working off, so the data for this specific example probably isn't included in the schema for the new files. You'll want to base future functions off the default model (i.e. copy_model("default", tmp_path)) where possible.

In theory #312 is done, but I'll leave it up until after the standup tomorrow in case anyone else wants to review it. It would also be easy just to pin the numpy version in this PR or a separate one.

@codecov
Copy link

codecov bot commented Jun 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.36%. Comparing base (5ab22bb) to head (9d91b50).

Additional details and impacted files
@@           Coverage Diff            @@
##           develop     #346   +/-   ##
========================================
  Coverage    71.36%   71.36%           
========================================
  Files           44       44           
  Lines         5887     5887           
  Branches      1162     1162           
========================================
  Hits          4201     4201           
  Misses        1365     1365           
  Partials       321      321           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cc-a cc-a force-pushed the reader-test-improvements branch from 536567d to 88b8285 Compare June 24, 2024 14:04
@cc-a cc-a changed the title Expand test_read_trade_technodata to additional dataset attributes Expand reader tests Jun 24, 2024
@cc-a
Copy link
Collaborator Author

cc-a commented Jun 24, 2024

This PR attempts attempts a comprehensive pass at the reader functions in an attempt to more rigourously define the data model used within Muse. Where possible reader functions are tested against the default example model using a single example data file. Other models are used where necessary. Some functions it was not possible/necessary to test for the reason given.

Reader functions tested against default model (and the data file used):

  • read_technodictionary - "technodata/residential/Technodata.csv"
  • read_io_technodata - "technodata/residential/CommOut.csv"
  • read_initial_assets - "technodata/residential/ExistingCapacity.csv"
  • read_global_commodities - "input/GlobalCommodities.csv"
  • read_csv_agent_parameters - "technodata/Agents.csv"
  • read_initial_market - "input/Projections.csv"
  • test_read_attribute_table - "input/Projections.csv"

Reader functions tested against other models:

  • read_technodata_timeslices - default_timeslice - "technodata/power/TechnodataTimeslices.csv"

Reader funtions not tested:

  • read_initial_capacity - called from multiple examples but is always passed an array and never reads data from a file
  • read_technologies - wraps other reader functions, does not read data directly
  • read_csv_timeslices - not called by any example model
  • read_timeslice_shares - not called by any example model
  • read_macro_drivers - not called by any example

Potentially these last may need to be reviewed as to whether they are still relevant or could be removed.

@cc-a cc-a requested review from alexdewar and tsmbland June 24, 2024 14:21
@tsmbland
Copy link
Collaborator

This looks great, although I do worry the type checks might be a bit restrictive. I guess all the types are inferred based on the input data, and what you've got here is correct for the datasets you're testing.

However, for example, all of the variables with int64 dtype could also be floats, and if we were using a database with a pre-defined shema, we would store them as such. So we'll run into problems if we want to use these tests for test-driven development of the database (which I think was ultimately the plan). Not sure what the best solution is here.

Similar thing with the string fields, as we've got a mix of O, <U4, <U11 etc., when I think really we just want to check that these are strings.

@cc-a
Copy link
Collaborator Author

cc-a commented Jun 24, 2024

Thanks @tsmbland. You make a good point. I think for your current purposes recording the tightest possible type makes sense particularly as it's hard to know how interchangeable e.g. ints and floats are in the code. But we should think carefully about how we tightly we specify types when writing tests for the planned changes.

Copy link
Collaborator

@tsmbland tsmbland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I think we can merge what you've got here and create a separate branch for the new inputs

@cc-a cc-a marked this pull request as ready for review June 25, 2024 13:18
@cc-a cc-a merged commit e09a23d into develop Jun 25, 2024
@cc-a cc-a deleted the reader-test-improvements branch June 25, 2024 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants