Skip to content

[DRAFT] Scripts for building models#937

Closed
tsmbland wants to merge 1 commit intomainfrom
scripts_for_building_models
Closed

[DRAFT] Scripts for building models#937
tsmbland wants to merge 1 commit intomainfrom
scripts_for_building_models

Conversation

@tsmbland
Copy link
Copy Markdown
Collaborator

Description

Please include a summary of the change and which issue is fixed (if any). Please also
include relevant motivation and context. List any dependencies that are required for
this change.

Fixes # (issue)

Type of change

  • Bug fix (non-breaking change to fix an issue)
  • New feature (non-breaking change to add functionality)
  • Refactoring (non-breaking, non-functional change to improve maintainability)
  • Optimization (non-breaking change to speed up the code)
  • Breaking change (whatever its nature)
  • Documentation (improve or add documentation)

Key checklist

  • All tests pass: $ cargo test
  • The documentation builds and looks OK: $ cargo doc

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works

@codecov
Copy link
Copy Markdown

codecov bot commented Oct 16, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.93%. Comparing base (b80ce22) to head (4819637).
⚠️ Report is 428 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #937   +/-   ##
=======================================
  Coverage   84.93%   84.93%           
=======================================
  Files          50       50           
  Lines        5323     5323           
  Branches     5323     5323           
=======================================
  Hits         4521     4521           
  Misses        574      574           
  Partials      228      228           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tsmbland
Copy link
Copy Markdown
Collaborator Author

tsmbland commented Oct 16, 2025

@alexdewar @dalonsoa This is extremely rough (ignore the horrendous python code), but gets across the idea I have in mind for building the example models programmatically. The main idea is that you can set a base_model in model.toml, then instead of writing full model csv files you can create a series of diff files that describe how to modify the files in the base model. This is inspired by git diff files (same syntax for addition/deletion of lines), but a bit simpler because

  1. Line order is irrelevant (apart from the first line, obviously)
  2. We expect all lines to be unique

I think it would work well for what we need because most changes to models will involve additions/deletions/changes to rows in the csv, rather than columns, as generally all columns are fixed and mandatory (a few exceptions in the case of optional columns where you might want to add a column which would end up changing all rows - not sure what to do about this). Column order is flexible, so I've included the columns in the diff files so we can check this against the base model for safety.

Downside of this format is that it doesn't work with excel, so maybe we should change the file format slightly to placate excel users, if this is something we want to provide as a feature for users, rather than just developers.

Model changes that involve the toml file (e.g. changing the milestone years) won't be captured here, but this should be easy (e.g. use settings from the base model unless overwritten in the new toml file).

I've built a set of diffs for the example models using a rubbish script which compares two models, which leads to some silly results (e.g. replacing a line to change 1.0 to 1), but gets the idea across. In real life you wouldn't do this - you'd start with your base model, then manually build the diffs on top of this, and obviously you'd only add diffs that are actually meaningful.

Especially for the "missing_commodity" model this makes a lot of sense as it's almost identical to the "simple" model with just a few extra lines, so it doesn't make sense to have a completely separate model with all the data copied, which could potentially get out of sync. "two_regions" is also very similar to "muse1_default". "two_outputs" is a bit more different (based on "simple"), potentially at the point where it's worth being its own model, but I've included the diffs anyway to see what they look like. This would also make a lot of sense for the models in #914 as they're extremely similar to the "simple" model and each other.

If we want to go down this route then we could probably write the code for applying diffs in rust, and support this as a native way of representing a model, i.e. you could run muse2 run directly on a folder of diffs, and it will build the full model under the hood without necessarily having to save the csv files.

What do you think? Good idea? Terrible idea?

@alexdewar
Copy link
Copy Markdown
Collaborator

alexdewar commented Oct 16, 2025

As a general idea, I'm not against it, but I'm a bit worried we might end up making our lives harder here. I can see that as we add more and more examples it could be useful to be able to have one example be a "base model" for another, like you suggest. And I think your approach is basically fine, except that it's a hard problem and so there are drawbacks with pretty much any approach.

It's worth remembering that the example files are currently basically part of the code, as they're bundled into the executable file, and I don't think we should change that. I'm not convinced this is functionality that we want end users to have either, because it will make the input format potentially a bit complex and we might end up having to support it indefinitely. There is a use case for wanting to have some kind of templated input for a parameter sweep or something, but I think that's a bigger problem.

I think there are a few ways to go about it:

  1. Have a script to generate examples from templates, but run it manually and commit the result to the repo (like what we do with regression test data)
  2. Generate the models at compile time with a Python script invoked in build.rs
  3. Generate the models at compile time with pure Rust code in build.rs
  4. Generate the models at runtime

For 1: we'd still have to run a script manually and the examples could get out of sync with the script/template files. We could write a pre-commit hook or something to check, but that's more work. If you don't have a working Python installation, you can't update the examples.

For 2: would mean if you don't have a Python installation you can't build MUSE2 locally or install it with cargo install muse2. Or we just don't bundle some of the examples if Python isn't found, but that also seems problematic.

For 3: a better solution than 2, but you can't import anything from the muse2 crate in build.rs (as build.rs is run first), so if we wanted the code to handle the template files to also be in the final executable, we'd have to split it into a separate package that was also published to crates.io.

For 4: possible but will make the code more complex

I feel like we should probably leave this for now until maintaining the examples is becoming a bit more of a problem. If you have to update a few models at a time I don't think that's such a big deal and it is at least explicit. You might also want to deliberately make different changes to different models and that would be a problem for contributors who don't understand our bespoke template format.

I think this is interesting food for thought though! But maybe we should consider all the options a bit more before committing to anything. We did also talk about having more variations on examples that are exclusively used for regression tests and these problems don't apply in the same way to that, so maybe that's somewhere we could try out this approach.

@tsmbland
Copy link
Copy Markdown
Collaborator Author

tsmbland commented Oct 16, 2025

Thanks @alexdewar

As a general idea, I'm not against it, but I'm a bit worried we might end up making our lives harder here. I can see that as we add more and more examples it could be useful to be able to have one example be a "base model" for another, like you suggest. And I think your approach is basically fine, except that it's a hard problem and so there are drawbacks with pretty much any approach.

By "hard problem" do you mean difficult or "hard" in the algorithmic sense? (I don't think it's either to be honest but just wondering what you meant)

It's worth remembering that the example files are currently basically part of the code, as they're bundled into the executable file, and I don't think we should change that. I'm not convinced this is functionality that we want end users to have either, because it will make the input format potentially a bit complex and we might end up having to support it indefinitely. There is a use case for wanting to have some kind of templated input for a parameter sweep or something, but I think that's a bigger problem.

I actually think this could be very useful for users, and we should encourage this as an approach. Time and time again I've seen MUSE1 users with 10+ models that are all small deviations of one base model, where they've manually copied the base model and made changes. Problem 1 is that it makes it difficult for others to see what perturbations you're testing with each model, or even difficult for you as the model developer to remember this unless you've clearly documented it. Problem 2 is that these models can get out of sync and end up differing in ways that you didn't intend. If you're toying with the base model, to make a change unrelated to the scenarios you're testing, then unless you remember to make the same change in all the other models, those models are no longer just testing the perturbation that you intended, which can lead to very flawed conclusions. I have seen this happen.

Main point: often the very point of a model is to be a specific deviation from a specific base model, and I think we should give users the language to express this.

RE "making the input format more complex" I don't really think so - it's the same format that we currently have just with a "+" or "-" at the beginning of the row. Any approach will involve some new concepts for the users to learn, but I can't really think of anything simpler than this. The massive benefit is that they don't have to write any code. I've currently saved the diff files as ".diff", but they could just as easily be ".csv" with some extra character in position 0,0 to signify that this is a diff file, or rename them to "assets_diff.csv" etc.

If it's a large parameter sweep this is probably less useful - you'd probably have to write a script anyway whether you're writing diff files or whole models. But I think this is less common, I've never actually seen anyone do this in MUSE1.

I think there are a few ways to go about it:

  1. Have a script to generate examples from templates, but run it manually and commit the result to the repo (like what we do with regression test data)
  2. Generate the models at compile time with a Python script invoked in build.rs
  3. Generate the models at compile time with pure Rust code in build.rs
  4. Generate the models at runtime

For 1: we'd still have to run a script manually and the examples could get out of sync with the script/template files. We could write a pre-commit hook or something to check, but that's more work. If you don't have a working Python installation, you can't update the examples.

For 2: would mean if you don't have a Python installation you can't build MUSE2 locally or install it with cargo install muse2. Or we just don't bundle some of the examples if Python isn't found, but that also seems problematic.

For 3: a better solution than 2, but you can't import anything from the muse2 crate in build.rs (as build.rs is run first), so if we wanted the code to handle the template files to also be in the final executable, we'd have to split it into a separate package that was also published to crates.io.

For 4: possible but will make the code more complex

All good ideas! Given what you've said I think 4 is the best option. I don't think it would make the code much more complex. Maybe a bit fiddly, but once we have the function for merging a base file with a diff file we can use this throughout. I guess it would just take two strings for the source and diff files and return a new string for the merged file.

I feel like we should probably leave this for now until maintaining the examples is becoming a bit more of a problem.

Perhaps, but it is fast becoming a bigger problem.

If you have to update a few models at a time I don't think that's such a big deal and it is at least explicit. You might also want to deliberately make different changes to different models and that would be a problem for contributors who don't understand our bespoke template format.

I mean sure, but I think this is a relatively minor barrier for new contributors in the grand scheme of things

I think this is interesting food for thought though! But maybe we should consider all the options a bit more before committing to anything. We did also talk about having more variations on examples that are exclusively used for regression tests and these problems don't apply in the same way to that, so maybe that's somewhere we could try out this approach.

Great idea! Like what you've suggested in #935 - I don't think it makes sense to add an example model for this, but it would be good to add a test to make sure this is handled properly (probably don't even need to check the results, just make sure it runs to completion, or even just that it passes/fails validation). So yes if we can read in a base model like "simple" and apply changes to it programmatically, that would avoid having to commit an entire set of input files which would be a big win. Maybe I'll start with this then...

@alexdewar
Copy link
Copy Markdown
Collaborator

alexdewar commented Oct 16, 2025

Thanks @alexdewar

As a general idea, I'm not against it, but I'm a bit worried we might end up making our lives harder here. I can see that as we add more and more examples it could be useful to be able to have one example be a "base model" for another, like you suggest. And I think your approach is basically fine, except that it's a hard problem and so there are drawbacks with pretty much any approach.

By "hard problem" do you mean difficult or "hard" in the algorithmic sense? (I don't think it's either to be honest but just wondering what you meant)

I meant the first, but actually I was mostly concerned about making the build system more complicated. If we go with option 4 though then it'll just be a few extra functions in the code, which is not a big deal at all.

It's worth remembering that the example files are currently basically part of the code, as they're bundled into the executable file, and I don't think we should change that. I'm not convinced this is functionality that we want end users to have either, because it will make the input format potentially a bit complex and we might end up having to support it indefinitely. There is a use case for wanting to have some kind of templated input for a parameter sweep or something, but I think that's a bigger problem.

I actually think this could be very useful for users, and we should encourage this as an approach. Time and time again I've seen MUSE1 users with 10+ models that are all small deviations of one base model, where they've manually copied the base model and made changes. Problem 1 is that it makes it difficult for others to see what perturbations you're testing with each model, or even difficult for you as the model developer to remember this unless you've clearly documented it. Problem 2 is that these models can get out of sync and end up differing in ways that you didn't intend. If you're toying with the base model, to make a change unrelated to the scenarios you're testing, then unless you remember to make the same change in all the other models, those models are no longer just testing the perturbation that you intended, which can lead to very flawed conclusions. I have seen this happen.

Main point: often the very point of a model is to be a specific deviation from a specific base model, and I think we should give users the language to express this.

RE "making the input format more complex" I don't really think so - it's the same format that we currently have just with a "+" or "-" at the beginning of the row. Any approach will involve some new concepts for the users to learn, but I can't really think of anything simpler than this. The massive benefit is that they don't have to write any code. I've currently saved the diff files as ".diff", but they could just as easily be ".csv" with some extra character in position 0,0 to signify that this is a diff file, or rename them to "assets_diff.csv" etc.

If it's a large parameter sweep this is probably less useful - you'd probably have to write a script anyway whether you're writing diff files or whole models. But I think this is less common, I've never actually seen anyone do this in MUSE1.

Ok, this is interesting. Maybe it's no bad thing to give users this flexibility, as long as they still have the option to write models with plain old TOML/CSV. It could be in an "advanced' section of the documentation or something.

One thing that's occurred to me is I'm not sure there is a way with the current format to patch the model.toml file itself (i.e. to add/remove/modify params). Is that something we want? One way to do this would be to have a separate template.toml file or whatever that defines base_model and then model.toml can be templated like the CSV files.

...

Great idea! Like what you've suggested in #935 - I don't think it makes sense to add an example model for this, but it would be good to add a test to make sure this is handled properly (probably don't even need to check the results, just make sure it runs to completion, or even just that it passes/fails validation). So yes if we can read in a base model like "simple" and apply changes to it programmatically, that would avoid having to commit an entire set of input files which would be a big win. Maybe I'll start with this then...

Cool, let's try this out for tests first then and see how we get on. There are probably loads of extra tests we could/should add this way. It's a bit of a pain defining all the data structures you need in Rust code, just so you can check that one function works correctly. Would be cool if you could write a test along the lines of "build a model based on the simple example with this one extra line and it should fail validation with this message".

@tsmbland
Copy link
Copy Markdown
Collaborator Author

Ok, this is interesting. Maybe it's no bad thing to give users this flexibility, as long as they still have the option to write models with plain old TOML/CSV. It could be in an "advanced' section of the documentation or something.

Yeah yeah definitely not suggesting we take away that option!

One thing that's occurred to me is I'm not sure there is a way with the current format to patch the model.toml file itself (i.e. to add/remove/modify params). Is that something we want? One way to do this would be to have a separate template.toml file or whatever that defines base_model and then model.toml can be templated like the CSV files.

Yeah we definitely do want this. I think, for example, if you wanted to run the "simple" model with scarcity pricing, ideally you'd just have a folder containing a model.toml file that looks like

base_model = "examples/simple"
pricing_strategy = "scarcity_adjusted"

and it will use all the settings from the base model unless re-defined. If that makes sense

...

Great idea! Like what you've suggested in #935 - I don't think it makes sense to add an example model for this, but it would be good to add a test to make sure this is handled properly (probably don't even need to check the results, just make sure it runs to completion, or even just that it passes/fails validation). So yes if we can read in a base model like "simple" and apply changes to it programmatically, that would avoid having to commit an entire set of input files which would be a big win. Maybe I'll start with this then...

Cool, let's try this out for tests first then and see how we get on. There are probably loads of extra tests we could/should add this way. It's a bit of a pain defining all the data structures you need in Rust code, just so you can check that one function works correctly. Would be cool if you could write a test along the lines of "build a model based on the simple example with this one extra line and it should fail validation with this message".

👍

@alexdewar
Copy link
Copy Markdown
Collaborator

Yeah we definitely do want this. I think, for example, if you wanted to run the "simple" model with scarcity pricing, ideally you'd just have a folder containing a model.toml file that looks like

base_model = "examples/simple"
pricing_strategy = "scarcity_adjusted"

and it will use all the settings from the base model unless re-defined. If that makes sense

Yep, makes sense. I just don't know how easy it'll be to get the TOML parsing code to handle this.

It also raises the interesting possibility of having a base model that itself has a base model...

@tsmbland tsmbland requested review from Aurashk and dalonsoa November 3, 2025 11:43
@dalonsoa
Copy link
Copy Markdown
Collaborator

dalonsoa commented Nov 5, 2025

I really like the idea, and using a diff syntax like @tsmbland suggest I think makes it pretty clear. I specially see the advantage of using this by end users when they develop slightly different models but still need for them to be consistent. Diverging models by mistake is a really big problem.

The implementation might be more or less complicated but I think that's entirely on us and it will be sorted, one way or another. So I would focus on making sure that the approach is appropriate for end users. For example, I would suggest the following:

  • Diff files are plain csv files, but with _diff in the name matching the name of the file they are modifying, so they can be handled easily by Excel and other tools.
  • The sign (+ or -) should be a column in itself, for the same reason, which is dropped when merging the files.
  • Combined files should not be output unless the user explicitly request them, otherwise there's a risk of diverging inputs if the user starts modifying those files instead.
  • For cascading base models, I think we just need a recursive functions?

Some initial thoughts, but I definitely see the point of this, maybe more for end users than for the tests, to be honest, although the tests are certainly a beneficiary of this approach.

@tsmbland
Copy link
Copy Markdown
Collaborator Author

Thanks for the comments. Closing this draft, #1026 in progress taking suggestions on board

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants