Compute demand map when loading in data#276
Conversation
Also make private.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #276 +/- ##
==========================================
+ Coverage 95.29% 95.95% +0.65%
==========================================
Files 26 27 +1
Lines 2613 3014 +401
Branches 2613 3014 +401
==========================================
+ Hits 2490 2892 +402
Misses 47 47
+ Partials 76 75 -1 ☔ View full report in Codecov by Sentry. |
dalonsoa
left a comment
There was a problem hiding this comment.
I've tried to review the PR but, frankly, I'm having trouble following the code. It was already complicated as it was - hence the suggested code reorganisation - so adding more structures, more encapsulations and more functions linking everything together just feels overengineer at this point. Especially when we do not know, yet, what the structure to solve the problem - dispatch, investment - needs to be.
I think a (possibly long) meeting to discuss things through is necessary, to fully understand this PR. And define what mathematical problem we need to solve in reality, working backwards to define the appropriate structures to solve it.
src/demand.rs
Outdated
| } | ||
|
|
||
| /// A map relating commodity, region and year to annual demand | ||
| type AnnualDemandMap = HashMap<AnnualDemandMapKey, f64>; |
There was a problem hiding this comment.
Why do you define a proper struct for the DemandMap but here you just declare a sort of type alias?
There was a problem hiding this comment.
I've defined a few type aliases for things we use exclusively within the demand module, but I thought it made sense to make a new type for DemandMap as we're exporting it and doing this lets us define a more user-friendly interface for accessing the data (i.e. various getters).
src/demand.rs
Outdated
| year: u32, | ||
| } | ||
|
|
||
| type CommodityRegionPairs = HashSet<(Rc<str>, Rc<str>)>; |
|
@dalonsoa I hear what you're saying and I do think it would be worth having a meeting for anyone who's interested to talk about the changes (I guess it makes sense to do it in the new year now). I probably should have given more explanation in the PR description given the scale of the changes, but ended up rushing it a bit -- sorry! Basically, though, I think we need to have code that does something like what's in this PR and I'm not sure there's any way around it. We might want to do something slightly different with the final Realistically, there's no model where it would ever make sense to store data in the kind of "raw" format we have now, which lets users specify things like "for commodity X and region Y, 40% of the demand is in winter". No model will care what the fraction of annual demand is, just as no model will want to know what the demand is for a whole season rather than for an individual time slice. So these two calculations -- converting fractional demand to actual demand and expanding time slice ranges -- will always be necessary. And if you put off doing this until you actually need to know what the values are a) figuring it out will be really slow and b) you won't be able to validate the data properly (e.g. we need to ensure that users don't specify demand slice entries for overlapping time slice ranges, such as With the last few days before the break I'll have a go at making a plan for how to tackle the rest of the project and it probably will make sense to have a meeting to talk this through in the new year. While it generally makes sense to think of the bigger picture, though, whatever we decide won't change how we calculate demand from the input files. |
|
I'm converting this to draft because we've agreed that it makes sense to do the refactoring (#285) in this branch so that it's more readable and easier to review. It would have been better to do the refactoring first, but unfortunately I'd already started on this demand map stuff by then 🤷 |
|
Right, I've refactored it now so that the demand-related CSV code is in I'll mark this as ready for review now but we've agreed that we'll discuss it in a meeting. |
|
@tsmbland I've put the calculation of demand share into a separate, more generic function as you suggested. I thought about writing the function so that it could take a map as an input argument and it would then fill it using the time slice + value pairs, but then I realised the map types will likely be different in every case (i.e. even though the keys will include a time slice component, they'll also likely include other things like region ID etc.). So instead I've made it return an iterator, which hopefully means it'll be fairly easy to reuse. |
tsmbland
left a comment
There was a problem hiding this comment.
In hindsight, I think it would have been better (for now) to input actual demand data at the timeslice-level, and worry about fractions and time-slice ranges (which is ultimately just a user-experience choice) at a later date. However, since we haven't done that, I agree that something along these lines is necessary, both for validation and giving meaningful numbers to the optimization.
I really don't have any idea whether the HashMap structure with region, year and timeslice as keys is the right choice. I imagine this will become clearer as we go through, but we need a starting point and this seems as good as any.
In terms of the code, it seems reasonable and well tested, but it's very long and complicated so I'm giving some benefit of the doubt
|
|
||
| /// Retrieve the demand for the specified region, year and time slice | ||
| pub fn get(&self, region_id: Rc<str>, year: u32, time_slice: TimeSliceID) -> Option<f64> { | ||
| self.0 |
There was a problem hiding this comment.
DemandMap is what's called a newtype. Essentially it's a struct with an unnamed HashMap inside it, and self.0 is how you access it.
| value: f64, | ||
| ) -> impl Iterator<Item = (&'a TimeSliceID, f64)> { | ||
| self.iterate_selection_share(selection) | ||
| .map(move |(ts, share)| (ts, value * share)) |
There was a problem hiding this comment.
I believe this is the most important line in terms of the maths. Could you just explain in human English what this is doing? E.g. what's move?
There was a problem hiding this comment.
iterate_selection_share returns an iterator of the relative lengths for each of the time slices (i.e. normalised so they sum to one) along with the time slice. This line just multiplies each of these relative lengths by value. I was in two minds about whether to split the calculation into two functions like this or just have the one tbh.
The move thing is just to do with Rust's ownership rules -- it doesn't affect the calculation. It's just that otherwise we wouldn't be able to return an iterator which uses value, because value goes out of scope at the end of this function. So the move "gives" value to this lambda function.
I agree. In retrospect I wish I hadn't gone so far down this rabbit hole 🙃. Oh well. We would have needed to do this at some point anyway, so at least it's done now.
Yeah, we can always fiddle with it later (I think whatever we come up with though, it'll involve some kind of
👍 |
Description
The main purpose of this PR is to add code to compute the demand map as we load in data (#231), but along the way I've closed various other issues to do with validating the input data. Unfortunately this has ended up rather bigger than I wanted, but I couldn't think of a sensible way to split it up 😞. V happy to talk through it on Teams for anyone who's interested.
A demand map relates, for a given commodity, the region, milestone year and time slice to the demand for that commodity. There are two input files related to this:
demand.csv, which contains the annual demand for each commodity, region and year combination, anddemand_slicing.csv, which specifies how the annual demand is shared across different time slices. An additional complexity is that users can specify the demand slicing for a whole season or the whole year rather than just a single time slice. I've created a newDemandMaptype, which currently just has a single method to look up the demand for a given region, year and time slice combination. Note that we aren't limited to this API though: while I'm using aHashMapwith region + year + time slice as a key, we can still search through the map if, e,g, we want all values for a given region. I was just assuming that this is the way we'll want to look up demand most often, so it made sense to do it this way; we can always add other helpers later if needed.As
demand.rsis now getting rather long, I thought about splitting it up, but figured it was worth holding off on that until we refactor all the input code (I'm yet to create an issue for that).Closes #202. Closes #203. Closes #204. Closes #230. Closes #231.
Type of change
Key checklist
$ cargo test$ cargo docFurther checks