Compute demand map when loading in data by alexdewar · Pull Request #276 · EnergySystemsModellingLab/MUSE2

alexdewar · 2024-12-17T12:09:02Z

Description

The main purpose of this PR is to add code to compute the demand map as we load in data (#231), but along the way I've closed various other issues to do with validating the input data. Unfortunately this has ended up rather bigger than I wanted, but I couldn't think of a sensible way to split it up 😞. V happy to talk through it on Teams for anyone who's interested.

A demand map relates, for a given commodity, the region, milestone year and time slice to the demand for that commodity. There are two input files related to this: demand.csv, which contains the annual demand for each commodity, region and year combination, and demand_slicing.csv, which specifies how the annual demand is shared across different time slices. An additional complexity is that users can specify the demand slicing for a whole season or the whole year rather than just a single time slice. I've created a new DemandMap type, which currently just has a single method to look up the demand for a given region, year and time slice combination. Note that we aren't limited to this API though: while I'm using a HashMap with region + year + time slice as a key, we can still search through the map if, e,g, we want all values for a given region. I was just assuming that this is the way we'll want to look up demand most often, so it made sense to do it this way; we can always add other helpers later if needed.

As demand.rs is now getting rather long, I thought about splitting it up, but figured it was worth holding off on that until we refactor all the input code (I'm yet to create an issue for that).

Closes #202. Closes #203. Closes #204. Closes #230. Closes #231.

Type of change

Bug fix (non-breaking change to fix an issue)
New feature (non-breaking change to add functionality)
Refactoring (non-breaking, non-functional change to improve maintainability)
Optimization (non-breaking change to speed up the code)
Breaking change (whatever its nature)
Documentation (improve or add documentation)

Key checklist

All tests pass: $ cargo test
The documentation builds and looks OK: $ cargo doc

Further checks

Code is commented, particularly in hard-to-understand areas
Tests added that prove fix is effective or that feature works

Also make private.

Closes #203.

Closes #231. Closes #202.

Closes #230.

…nd.csv

codecov · 2024-12-17T12:10:18Z

Codecov Report

Attention: Patch coverage is 99.14530% with 7 lines in your changes missing coverage. Please review.

Project coverage is 95.95%. Comparing base (859f9c3) to head (69aa5d6).
Report is 51 commits behind head on main.

Files with missing lines	Patch %	Lines
src/input/commodity/demand.rs	99.01%	0 Missing and 3 partials ⚠️
src/input/commodity/demand_slicing.rs	99.20%	1 Missing and 2 partials ⚠️
src/time_slice.rs	98.78%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #276      +/-   ##
==========================================
+ Coverage   95.29%   95.95%   +0.65%     
==========================================
  Files          26       27       +1     
  Lines        2613     3014     +401     
  Branches     2613     3014     +401     
==========================================
+ Hits         2490     2892     +402     
  Misses         47       47              
+ Partials       76       75       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

dalonsoa

I've tried to review the PR but, frankly, I'm having trouble following the code. It was already complicated as it was - hence the suggested code reorganisation - so adding more structures, more encapsulations and more functions linking everything together just feels overengineer at this point. Especially when we do not know, yet, what the structure to solve the problem - dispatch, investment - needs to be.

I think a (possibly long) meeting to discuss things through is necessary, to fully understand this PR. And define what mathematical problem we need to solve in reality, working backwards to define the appropriate structures to solve it.

dalonsoa · 2024-12-17T12:34:02Z

src/demand.rs

+}
+
+/// A map relating commodity, region and year to annual demand
+type AnnualDemandMap = HashMap<AnnualDemandMapKey, f64>;


Why do you define a proper struct for the DemandMap but here you just declare a sort of type alias?

I've defined a few type aliases for things we use exclusively within the demand module, but I thought it made sense to make a new type for DemandMap as we're exporting it and doing this lets us define a more user-friendly interface for accessing the data (i.e. various getters).

dalonsoa · 2024-12-17T12:34:10Z

src/demand.rs

+    year: u32,
+}
+
+type CommodityRegionPairs = HashSet<(Rc<str>, Rc<str>)>;


And same here.

alexdewar · 2024-12-18T09:02:28Z

@dalonsoa I hear what you're saying and I do think it would be worth having a meeting for anyone who's interested to talk about the changes (I guess it makes sense to do it in the new year now). I probably should have given more explanation in the PR description given the scale of the changes, but ended up rushing it a bit -- sorry!

Basically, though, I think we need to have code that does something like what's in this PR and I'm not sure there's any way around it. We might want to do something slightly different with the final DemandMap data structure (I mentioned HashMaps, but that was probably just a distraction); fundamentally, the code I've added is just necessary if you want to be able to figure out what the actual demand is for a particular combination of commodity + region + year + time slice. The reason this code is complicated is effectively because the interface is complicated: we could also just mandate that users provide a single CSV file with the actual demand for each combination of these parameters and that would give us the information we need; it just wouldn't be very user friendly, hence why there's this whole business of letting users specify how demand is shared across time slices (or ranges of time slices). I'm not saying it's the wrong input format (it seems pretty reasonable to me), but it is somewhat complicated and so the code that processes is will necessarily be complicated too.

Realistically, there's no model where it would ever make sense to store data in the kind of "raw" format we have now, which lets users specify things like "for commodity X and region Y, 40% of the demand is in winter". No model will care what the fraction of annual demand is, just as no model will want to know what the demand is for a whole season rather than for an individual time slice. So these two calculations -- converting fractional demand to actual demand and expanding time slice ranges -- will always be necessary. And if you put off doing this until you actually need to know what the values are a) figuring it out will be really slow and b) you won't be able to validate the data properly (e.g. we need to ensure that users don't specify demand slice entries for overlapping time slice ranges, such as winter and winter.day). There just really aren't many approaches that let you do this correctly.

With the last few days before the break I'll have a go at making a plan for how to tackle the rest of the project and it probably will make sense to have a meeting to talk this through in the new year. While it generally makes sense to think of the bigger picture, though, whatever we decide won't change how we calculate demand from the input files.

alexdewar · 2025-01-13T15:04:36Z

I'm converting this to draft because we've agreed that it makes sense to do the refactoring (#285) in this branch so that it's more readable and easier to review.

It would have been better to do the refactoring first, but unfortunately I'd already started on this demand map stuff by then 🤷

Closes #285.

alexdewar · 2025-01-13T16:30:21Z

Right, I've refactored it now so that the demand-related CSV code is in input::commodity::demand and input::commodity::demand_slicing which should hopefully help with readability. I moved the remaining data structures in demand.rs into commodity.rs.

I'll mark this as ready for review now but we've agreed that we'll discuss it in a meeting.

alexdewar · 2025-01-14T15:20:21Z

@tsmbland I've put the calculation of demand share into a separate, more generic function as you suggested.

I thought about writing the function so that it could take a map as an input argument and it would then fill it using the time slice + value pairs, but then I realised the map types will likely be different in every case (i.e. even though the keys will include a time slice component, they'll also likely include other things like region ID etc.). So instead I've made it return an iterator, which hopefully means it'll be fairly easy to reuse.

tsmbland

In hindsight, I think it would have been better (for now) to input actual demand data at the timeslice-level, and worry about fractions and time-slice ranges (which is ultimately just a user-experience choice) at a later date. However, since we haven't done that, I agree that something along these lines is necessary, both for validation and giving meaningful numbers to the optimization.

I really don't have any idea whether the HashMap structure with region, year and timeslice as keys is the right choice. I imagine this will become clearer as we go through, but we need a starting point and this seems as good as any.

In terms of the code, it seems reasonable and well tested, but it's very long and complicated so I'm giving some benefit of the doubt

tsmbland · 2025-01-15T12:53:16Z

src/commodity.rs

+
+    /// Retrieve the demand for the specified region, year and time slice
+    pub fn get(&self, region_id: Rc<str>, year: u32, time_slice: TimeSliceID) -> Option<f64> {
+        self.0


DemandMap is what's called a newtype. Essentially it's a struct with an unnamed HashMap inside it, and self.0 is how you access it.

tsmbland · 2025-01-15T13:36:12Z

src/time_slice.rs

+        value: f64,
+    ) -> impl Iterator<Item = (&'a TimeSliceID, f64)> {
+        self.iterate_selection_share(selection)
+            .map(move |(ts, share)| (ts, value * share))


I believe this is the most important line in terms of the maths. Could you just explain in human English what this is doing? E.g. what's move?

iterate_selection_share returns an iterator of the relative lengths for each of the time slices (i.e. normalised so they sum to one) along with the time slice. This line just multiplies each of these relative lengths by value. I was in two minds about whether to split the calculation into two functions like this or just have the one tbh.

The move thing is just to do with Rust's ownership rules -- it doesn't affect the calculation. It's just that otherwise we wouldn't be able to return an iterator which uses value, because value goes out of scope at the end of this function. So the move "gives" value to this lambda function.

alexdewar · 2025-01-16T09:57:53Z

In hindsight, I think it would have been better (for now) to input actual demand data at the timeslice-level, and worry about fractions and time-slice ranges (which is ultimately just a user-experience choice) at a later date. However, since we haven't done that, I agree that something along these lines is necessary, both for validation and giving meaningful numbers to the optimization.

I agree. In retrospect I wish I hadn't gone so far down this rabbit hole 🙃. Oh well. We would have needed to do this at some point anyway, so at least it's done now.

I really don't have any idea whether the HashMap structure with region, year and timeslice as keys is the right choice. I imagine this will become clearer as we go through, but we need a starting point and this seems as good as any.

Yeah, we can always fiddle with it later (I think whatever we come up with though, it'll involve some kind of HashMap). I think this data structure kind of makes sense for loading in and validating the data in any case, but we may want to turn it into something else after we've done that.

In terms of the code, it seems reasonable and well tested, but it's very long and complicated so I'm giving some benefit of the doubt

👍

alexdewar added 10 commits December 17, 2024 10:18

Fix: Don't reuse mutable demand object between tests

5fbcff8

Rename: DemandHashMap => CommodityDemandMap

754af64

Also make private.

Make bespoke DemandMap type with get method

c0f5935

Require demand entry year to be milestone year

f90f1a7

Check demand quantity is valid

181d3d1

Closes #203.

Move check that fractions sum to one to input.rs

e260ac2

Compute demand for each time slice when loading data

422ec1d

Closes #231. Closes #202.

Remove unnecessarily verbose error-checking

b15b63a

Require demand entries for all milestone years

49c180a

Closes #230.

Check for demand slicing provided without corresponding entry in dema…

f9cc83c

…nd.csv

alexdewar requested review from AdrianDAlessandro, HarmonicReflux, Sahil590, TinyMarsh, dalonsoa and tsmbland and removed request for AdrianDAlessandro and dalonsoa December 17, 2024 12:09

dalonsoa reviewed Dec 17, 2024

View reviewed changes

This was referenced Dec 18, 2024

demand.rs: Move CSV-related code to input/ #285

Closed

time_slice.rs: Move CSV-related code to input/ #288

Closed

alexdewar marked this pull request as draft January 13, 2025 15:02

alexdewar added 3 commits January 13, 2025 15:33

Merge branch 'main' into demand-time-slice-map

b88758b

Remove outdated comment

b91fc83

Move demand CSV code into submodule of input

04522d5

Closes #285.

alexdewar added 6 commits January 13, 2025 16:06

Rename: mod test to mod tests

48ba07c

Move demand slicing code into own submodule

10aa75c

Rename: DEMAND_SLICES_FILE_NAME => DEMAND_SLICING_FILE_NAME

bcf643f

Remove redundant check

1fa3f1e

Add a longer doc comment for compute_demand_map

941292c

Rename: compute_demand_map => compute_demand_maps

de1b4a6

alexdewar marked this pull request as ready for review January 13, 2025 16:30

alexdewar linked an issue Jan 13, 2025 that may be closed by this pull request

demand.rs: Move CSV-related code to input/ #285

Closed

alexdewar added 2 commits January 14, 2025 14:14

TimeSliceInfo: iter{_selection}: Return fraction along with TimeSliceID

1cbee62

Put calculation of time slice share into separate function

69aa5d6

tsmbland approved these changes Jan 15, 2025

View reviewed changes

alexdewar merged commit f67887d into main Jan 17, 2025
7 checks passed

alexdewar deleted the demand-time-slice-map branch January 17, 2025 13:52

Comments

Conversation

alexdewar commented Dec 17, 2024

Description

Type of change

Key checklist

Further checks

Uh oh!

codecov bot commented Dec 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dalonsoa left a comment

Choose a reason for hiding this comment

Uh oh!

dalonsoa Dec 17, 2024

Choose a reason for hiding this comment

Uh oh!

alexdewar Dec 18, 2024

Choose a reason for hiding this comment

Uh oh!

dalonsoa Dec 17, 2024

Choose a reason for hiding this comment

Uh oh!

alexdewar commented Dec 18, 2024

Uh oh!

alexdewar commented Jan 13, 2025

Uh oh!

alexdewar commented Jan 13, 2025

Uh oh!

alexdewar commented Jan 14, 2025

Uh oh!

tsmbland left a comment

Choose a reason for hiding this comment

Uh oh!

tsmbland Jan 15, 2025

Choose a reason for hiding this comment

Uh oh!

alexdewar Jan 16, 2025

Choose a reason for hiding this comment

Uh oh!

tsmbland Jan 15, 2025

Choose a reason for hiding this comment

Uh oh!

alexdewar Jan 16, 2025

Choose a reason for hiding this comment

Uh oh!

alexdewar commented Jan 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Dec 17, 2024 •

edited

Loading