Scripts to generate tutorial models by tsmbland · Pull Request #279 · EnergySystemsModellingLab/MUSE_OS

tsmbland · 2024-04-23T14:41:24Z

Description

The documentation contains a number of tutorials for customising models (adding new technologies, regions, agents etc.). Each tutorial documents the processes that the user has to carry out to achieve the desired modification, and shows results for a model that has previously been build. I had a go at following along with the tutorials, starting from the default model and making the required changes step-by-step, and found that my results never matched up with the figures in the notebooks. I think the problem is two-fold:

The models contained in the repo were originally set up starting from an old version of the default model (I think).
Several additional modifications have been made to the models that are not documented in the tutorials.

To fix this, we need to:

Re-build the example models in the repo, starting with the current version of the default model
Make sure that all steps used to set up the models are documented in the tutorial notebooks

This isn't so easy as the current way of generating models is to copy the default model and then manually edit a load of csv files, which I really don't want to do (this would also be a problem if the default model changes again). I think the best solution would be, rather than hardcoding the models, generate the models programatically using a generate_model.py file which would be run every time the documentation is build. (Also with regressions tests to check whether the outputs of the notebooks have changed)

Initial plan (April 23rd)

Towards this goal, I've started by documenting the processes that would be required to generate the models with pseudocode (still a work in progress). I don't intend to turn this into real code just yet, this is just to demonstrate the processes that have to be carried out to customise models. I've created 'functions' for:

Adding a new commodity
Adding a new process
Adding price data for a new year
Adding an agent
Adding a region
Adding a timeslice

All of these could be a good starting point for creating some kind of 'wizard' to carry out these functions for the user. The idea is that adding a new feature (agent, region etc.) could consist of copying an existing feature (which could be automated), followed by a series of manual modifications to the csv files (the steps beginning with >>>). We could also set up the wizard so it can create a 'blank' feature (with everything set to zero), which the user would then manually modify.

That said, I also think that current csv structure of the models could be significantly improved (and possibly even replaced by a relational database), in which case all of this would change, so I don't want to go too far down this route just yet without further conversation.

Update (May 17th)

I've gone through and written the necessary scripts for all the tutorials. Within the tutorial-code folder you'll find two new files: generate_models.py and run_models.py, both of which do what they say on the tin. The first one runs in a loop to generate the model input files, the second one runs models in parallel to generate the results files. I'm currently committing all model input files and results files to the repo (as was the case before), rather than running the scripts to generate the files during the documentation build. This is for two reasons:

Many of the tutorial notebooks provide links to the input files on github. This is actually quite useful as it allows users to see exactly what input files were used to generate the results in the notebook
The results files are used for regression tests, which requires them to be in the repo

I wonder if there's a better way of doing these things that doesn't require all these files to be committed?

There is still a lot of work to be done on the contents of the notebook, as there are still many inconsistencies in the text. This PR is more about having a framework in place to generate the model input files programmatically, and I think I'll tackle the contents of the notebooks separately

Original documentation
New documentation

Fixes #99
Fixes #291

Type of change

Please add a line in the relevant section of
CHANGELOG.md to
document the change (include PR #) - note reverse order of PR #s.

New feature (non-breaking change which adds functionality)
Optimization (non-breaking, back-end change that speeds up the code)
Bug fix (non-breaking change which fixes an issue)
Breaking change (whatever its nature)

Key checklist

All tests pass: $ python -m pytest
The documentation builds and looks OK: $ python -m sphinx -b html docs docs/build

Further checks

Code is commented, particularly in hard-to-understand areas
Tests added that prove fix is effective or that feature works

dalonsoa

I love it! This sort of wizard would make both the tutorials more robust to changes and life much easier for people (specially newcomers). Power users can always edit things manually, but the entry level should be as low as possible.

I'm more than happy with this approach. Let's see what @alexdewar says.

On the input data, I totally agree that the current structure of CSV files is horrible, but I don't feel a database is an option. Inputs are often put together in Excel and, in any case, should be human-readable.

dalonsoa · 2024-04-26T08:28:35Z

I'd suggest you code the first tutorial only, for now, to see what things would look like in practice and then we can decide on the best way forward.

alexdewar · 2024-04-29T07:11:02Z

I love it too! I think a wizard along the lines you're suggesting is eminently sensible. As we talked about in the meeting with Adam, I think it would be nice to have a tool for setting input parameters in general, even when not copying from an existing template -- as someone who understands MUSE fairly poorly atm I'm not sure if it makes sense for this to be a separate task?

Another thing is: do we want the option of having a graphical interface for this? Or just a terminal one? A graphical interface may be more user-friendly (particularly for non-technical MUSE users), but it isn't necessarily something to do now.

I guess our options are:

Make a terminal-only interface now
Make a graphical-only interface now
Make a terminal-only interface, but make the code flexible enough that we could stick a (simple) GUI on top of it later

I'm leaning towards 3.

for more information, see https://pre-commit.ci

tsmbland · 2024-05-13T10:14:21Z

I've had a go at this for the first tutorial. See the folder 1-add-new-technology. The generate_models.py file is used to generate the relevant input files for the models described in the notebook, starting from the default model. My aim was to match the models as closely as possible to how they were before, which took a bit of detective work as many of the necessary steps aren't documented in the notebook. There are a few differences remaining as you can see from the diffs, but I think these are all insignificant, and the new version of the notebook looks identical to how it was before. I'm leaving the new input and results files here for now so we can look at the diffs, but I think eventually we'll want to remove them from the repo and generate them automatically when the documentation is build.

I've marked all the steps in generate_models.py that are undocumented in the notebook. We'll need to make sure these all get added as steps in the notebook so that users can fully reproduce the models (or remove any steps from the script that aren't necessary).

I've created a file wizard.py which contains the functions used to manipulate the input files. It's pretty minimal at the moment and only designed with this particular use case in mind. It's possible that this could be turned into a tool to help users build their models, however with all the possible edge cases I think making something robust enough to be useful would take a lot of work, so we'd need to decide whether that's worth it (especially since we may be changing the structure of the input files soon).

For now though, do you think it's worth going ahead with this for the other tutorials?

dalonsoa

I've given this a try and it works really well. I think the main benefit is that it ensures consistency with the base model (the default one) so if one changes, the rest would be updated automatically.

A couple of questions I have are:

If another tutorial builds on the model generated by, let's say, scenario 1, would they need a generate file that builds everything from scratch or can they actually thake the files for scenario 1 as the starting point.
When running the generate tutorial script, I get something slightly different to what was already in the repo. See attached. It's merely a formatting thing - it does not affect functionality - but will freak out the version control system because that file will look as modified. It happens whenever there's a list. I guess that when the toml file is saved, it follows some convention which is different to the convention it originally had. Or it might be just my computer...

dalonsoa · 2024-05-13T13:47:03Z

docs/tutorial-code/1-add-new-technology/generate_models.py

+
+    """
+    model_name = "1-introduction"
+    parent_path = os.path.dirname(os.path.abspath(sys.argv[0]))


Are you trying to get the directory if this file? If so, maybe I would do:

from pathlib import Path ... parent_path = Path(__file__).parent

Unless there's a reason why pathlib doesn't work in here.

tsmbland · 2024-05-13T14:52:35Z

If another tutorial builds on the model generated by, let's say, scenario 1, would they need a generate file that builds everything from scratch or can they actually thake the files for scenario 1 as the starting point.

I think this should be possible, but we would just have to make sure the scripts are run in the correct order in a loop (and not in parallel). Should be straightforward, although I haven't looked into this yet.

When running the generate tutorial script, I get something slightly different to what was already in the repo. See attached. It's merely a formatting thing - it does not affect functionality - but will freak out the version control system because that file will look as modified. It happens whenever there's a list. I guess that when the toml file is saved, it follows some convention which is different to the convention it originally had. Or it might be just my computer...

Have you tried running pre-commit? I think this should get rid of the diffs, but I guess it's still quite annoying. Long run though we'll probably add these files to gitignore so it shouldn't be a problem

dalonsoa · 2024-05-13T14:57:12Z

Not run pre-commit as I had nothing to commit. But you're right - if these files are generated on the fly, there's no reason to keep them in the repo, so we could just gitignore-them.

…ystemsModellingLab/MUSE_OS into generate_tutorial_models

tsmbland · 2024-05-17T08:25:17Z

I've gone through and written the necessary scripts for all the tutorials. Within the tutorial-code folder you'll find two new files: generate_models.py and run_models.py, both of which do what they say on the tin. The first one runs in a loop to generate the model input files, the second one runs models in parallel to generate the results files. I'm currently committing all model input files and results files to the repo (as was the case before), rather than running the scripts to generate the files during the documentation build. This is for two reasons:

Many of the tutorial notebooks provide links to the input files on github. This is actually quite useful as it allows users to see exactly what input files were used to generate the results in the notebook
The results files are used for regression tests, which requires them to be in the repo

I wonder if there's a better way of doing these things that doesn't require all these files to be committed?

There is still a lot of work to be done on the contents of the notebook, as there are still many inconsistencies in the text. This PR is more about having a framework in place to generate the model input files programmatically, and I think I'll tackle the contents of the notebooks separately

Original documentation
New documentation

dalonsoa

This looks really good. A massive work!

I've just added a few comments and flag the need of tests for the functions in the wizard.

docs/tutorial-code/1-add-new-technology/generate_models.py

docs/tutorial-code/generate_models.py

docs/tutorial-code/run_models.py

src/muse/wizard.py

alexdewar

This looks great! Sorry it took me so long to get round to reviewing it...

I've made some comments about lists and generators, but it's more of a note for future. Only change it if you can be bothered.

alexdewar · 2024-05-21T16:38:59Z

src/muse/wizard.py

+    files_to_update = [
+        model_path / file
+        for file in [
+            f"technodata/{sector}/CommIn.csv",
+            f"technodata/{sector}/CommOut.csv",
+            "input/BaseYearImport.csv",
+            "input/BaseYearExport.csv",
+            "input/Projections.csv",
+        ]
+    ] + list((model_path / "technodata/preset").glob("*"))


You could also join these two together with itertools.chain and avoid making all those lists, but it's not particularly important.

alexdewar · 2024-05-21T16:40:55Z

src/muse/wizard.py

+    files_to_update = [
+        model_path / file
+        for file in [
+            f"technodata/{sector}/CommIn.csv",
+            f"technodata/{sector}/CommOut.csv",
+            f"technodata/{sector}/ExistingCapacity.csv",
+            f"technodata/{sector}/Technodata.csv",
+        ]
+    ]


If you use round brackets you can leave this as a generator expression and then Python doesn't have to create a list object:

Suggested change

files_to_update = [

model_path / file

for file in [

f"technodata/{sector}/CommIn.csv",

f"technodata/{sector}/CommOut.csv",

f"technodata/{sector}/ExistingCapacity.csv",

f"technodata/{sector}/Technodata.csv",

]

]

files_to_update = (

model_path / file

for file in (

f"technodata/{sector}/CommIn.csv",

f"technodata/{sector}/CommOut.csv",

f"technodata/{sector}/ExistingCapacity.csv",

f"technodata/{sector}/Technodata.csv",

)

)

Again, just a nit, so not important

alexdewar · 2024-05-21T16:41:41Z

src/muse/wizard.py

+            "input/Projections.csv",
+        ]
+    ]
+    for file_path in sector_files + preset_files + global_files:


Ditto above

tsmbland · 2024-05-22T08:08:12Z

I've made some comments about lists and generators, but it's more of a note for future. Only change it if you can be bothered.

Thanks @alexdewar - useful advice!

dalonsoa

LGTM!!

Pseudocode for generating tutorial models

4fd0b40

tsmbland self-assigned this Apr 23, 2024

tsmbland changed the title ~~Pseudocode for generating tutorial models~~ Pseudocode for customising models Apr 23, 2024

Improvements to pseudocode

866b40e

tsmbland requested review from alexdewar and dalonsoa April 24, 2024 09:27

Merge branch 'develop' into generate_tutorial_models

841f0f7

tsmbland mentioned this pull request Apr 24, 2024

Add codespell pre-commit hook #281

Merged

dalonsoa approved these changes Apr 26, 2024

View reviewed changes

tsmbland linked an issue May 7, 2024 that may be closed by this pull request

Documentation tutorials #99

Closed

tsmbland and others added 4 commits May 10, 2024 10:48

Merge branch 'develop' into generate_tutorial_models

8f9b80c

[pre-commit.ci] auto fixes from pre-commit.com hooks

972cdde

for more information, see https://pre-commit.ci

Code to generate models for tutorial 1

5dee0f4

Fixing remaining model inconsistencies

15f0865

tsmbland requested a review from dalonsoa May 13, 2024 10:14

Merge branch 'develop' into generate_tutorial_models

a91c465

dalonsoa reviewed May 13, 2024

View reviewed changes

tsmbland and others added 8 commits May 14, 2024 08:48

Tutorial 2

3ee0221

Tutorial 3

96ef4f2

Use tomlkit for settings files

22677f5

Tutorial 4 (4.2 still faulty)

4b16b9f

Remove results files

7b8ab12

Tutorial 5

3d0a03b

Merge branch 'develop' into generate_tutorial_models

d16193c

Merge branch 'generate_tutorial_models' of https://github.com/EnergyS…

7615a34

…ystemsModellingLab/MUSE_OS into generate_tutorial_models

tsmbland added 11 commits May 15, 2024 16:28

Delete more results

0f93c16

Script for running models in parallel

159a0a6

Restore accidentally deleted file

247fafb

Fix path

539aa84

Merge branch 'develop' into generate_tutorial_models

5fdb83c

Add back results files

44206bd

Nicer comments, removing some unnecessary changes

d3d30cf

Fix tutorial 4.2

3b47054

Add tomlkit to dependencies

bcef20e

Fix faulty paths

5d90b1a

Remove index from consumption files

87781f4

tsmbland marked this pull request as ready for review May 16, 2024 22:06

tsmbland changed the title ~~Update tutorial models (and keep them updated)~~ Scripts to generate tutorial models May 16, 2024

Limit MinimumServiceFactor to <1

09b23d7

tsmbland mentioned this pull request May 17, 2024

Update notebook text to match updated models #311

Closed

dalonsoa requested changes May 21, 2024

View reviewed changes

docs/tutorial-code/1-add-new-technology/generate_models.py Outdated Show resolved Hide resolved

docs/tutorial-code/generate_models.py Outdated Show resolved Hide resolved

docs/tutorial-code/run_models.py Outdated Show resolved Hide resolved

src/muse/wizard.py Show resolved Hide resolved

tsmbland and others added 6 commits May 21, 2024 11:42

Sorting paths for generating models

935940a

Docstrings

24fede3

technodata > Technodata

579651f

Add tests for wizard module

f23806d

Merge branch 'develop' into generate_tutorial_models

543522d

Fix test_get_sectors function

68e0a98

tsmbland requested a review from dalonsoa May 21, 2024 14:39

alexdewar approved these changes May 21, 2024

View reviewed changes

dalonsoa approved these changes May 22, 2024

View reviewed changes

Use generator expressions

fbd9053

tsmbland merged commit 70b6ec7 into develop May 22, 2024

tsmbland deleted the generate_tutorial_models branch May 22, 2024 10:24

Comments

Conversation

tsmbland commented Apr 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Key checklist

Further checks

Uh oh!

dalonsoa left a comment

Choose a reason for hiding this comment

Uh oh!

dalonsoa commented Apr 26, 2024

Uh oh!

alexdewar commented Apr 29, 2024

Uh oh!

tsmbland commented May 13, 2024

Uh oh!

dalonsoa left a comment

Choose a reason for hiding this comment

Uh oh!

dalonsoa May 13, 2024

Choose a reason for hiding this comment

Uh oh!

tsmbland commented May 13, 2024

Uh oh!

dalonsoa commented May 13, 2024

Uh oh!

tsmbland commented May 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dalonsoa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexdewar left a comment

Choose a reason for hiding this comment

Uh oh!

alexdewar May 21, 2024

Choose a reason for hiding this comment

Uh oh!

alexdewar May 21, 2024

Choose a reason for hiding this comment

Uh oh!

alexdewar May 21, 2024

Choose a reason for hiding this comment

Uh oh!

tsmbland commented May 22, 2024

Uh oh!

dalonsoa left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tsmbland commented Apr 23, 2024 •

edited

Loading

tsmbland commented May 17, 2024 •

edited

Loading