Conversation
|
@TinyMarsh I was just talking with @tsmbland today about adding a |
|
@tsmbland take a look at this PR; it adds the I need help with 2 things;
Let me know if you wanna have a chat about this. |
|
Just spend a while trying to understand the general import/validation process. It varies for different tables depending on how much validation is required, but in general I think it looks like this (where X could be parameter, flow, PAC etc.):
I'm assuming it's similar for other modules, although I haven't looked closely. Anyway, I found it helpful to write out the process so figured I'd share it here. I guess this is what we're working towards for @TinyMarsh To answer your questions:
@alexdewar Is that all correct? Anything to add? |
|
Thanks @tsmbland, that's v comprehensive! I don't have much to add, but more recently I've been avoiding having separate Your |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #198 +/- ##
==========================================
+ Coverage 95.10% 95.14% +0.03%
==========================================
Files 13 13
Lines 2532 2614 +82
Branches 2532 2614 +82
==========================================
+ Hits 2408 2487 +79
- Misses 47 50 +3
Partials 77 77 ☔ View full report in Codecov by Sentry. |
|
Okay, I think I'm finally starting to understand how this all works. Thanks for your comprehensive summary @tsmbland! This might be ready for a review now @alexdewar |
alexdewar
left a comment
There was a problem hiding this comment.
Definitely along the right lines, but there are a few minor issues with the current implementation.
- We want to return errors rather than panicking
- It'd be better to have a separate function that just processes data without doing any I/O
- Would you mind writing a little test for this? There's only one kind of error you can get atm (bad commodity ID), so you just need to test this and the success case.
AdrianDAlessandro
left a comment
There was a problem hiding this comment.
One small comment. Agree with the tweaks Alex suggested. Looking good overall though!
|
Just double checking some logic here; most of the However with the Just double checking this is correct behaviour. I.e. there are multiple |
|
Thanks for the feedback all. This should be ready for a re-review now. |
|
@alexdewar something in particular I wanted to get your feedback on this this potential piece of silliness: I don't like the use of let iter = process_flow_raws.get(process_id).unwrap().into_iter();I wanted to avoid getting and passing a reference to @AdrianDAlessandro helped with this and together we landed on using |
alexdewar
left a comment
There was a problem hiding this comment.
Getting there, but I still think you can put more of the processing into read_process_flows_from_iter function.
I think the general approach should be:
read_csv()- Process the results (with
map()) - Group the results by ID
Atm you're doing the grouping before the processing, which then makes things a bit more fiddly.
For an example, see read_process_pacs_from_iter: https://github.com/EnergySystemsModellingLab/MUSE_2.0/blob/main/src/process.rs#L329
(Though in your case you'll be able to use the into_id_map() helper rather than into_group_map()).
|
Ah okay, apologies for the faff here, I don't think I really understood the assignment initially. I have refactored the logic as you described @alexdewar as the following; fn read_process_flows_from_iter<I>(
iter: I,
process_ids: &HashSet<Rc<str>>,
commodities: &HashMap<Rc<str>, Rc<Commodity>>,
) -> Result<HashMap<Rc<str>, Vec<ProcessFlow>>>
where
I: Iterator<Item = ProcessFlowRaw>,
{
iter.map(|flow_raw| {
let process_id = process_ids.get_id(&flow_raw.process_id)?;
let commodity = commodities
.get(flow_raw.commodity_id.as_str())
.with_context(|| format!("{} is not a valid commodity ID", &flow_raw.commodity_id))?;
let process_flow = ProcessFlow {
process_id: flow_raw.process_id,
commodity: Rc::clone(commodity),
flow: flow_raw.flow,
flow_type: flow_raw.flow_type,
flow_cost: flow_raw.flow_cost,
};
Ok((process_id, process_flow))
})
.into_id_map(process_ids)
}
fn read_process_flows(
model_dir: &Path,
process_ids: &HashSet<Rc<str>>,
commodities: &HashMap<Rc<str>, Rc<Commodity>>,
) -> Result<HashMap<Rc<str>, Vec<ProcessFlow>>> {
let file_path = model_dir.join(PROCESS_FLOWS_FILE_NAME);
let process_flow_csv = read_csv(&file_path)?;
read_process_flows_from_iter(process_flow_csv, process_ids, commodities)
.with_context(|| input_err_msg(&file_path))
}But I don't understand what the issue with using |
Never mind, I wasn't paying attention to what |
tsmbland
left a comment
There was a problem hiding this comment.
Looks good as far as I can tell! I'd just add an inline comment to read_process_flows_from_iter so it's clear what it's doing.
Also, shall we delete read_csv_grouped_by_id since it's no longer used anywhere?
alexdewar
left a comment
There was a problem hiding this comment.
I agree with @tsmbland's comment. I think there's one other minor tweak to make (see comment), but other than that, we're good to go!
Sorry this has ended up going through so many revisions... I should have been clearer in the issue description 😟
src/process.rs
Outdated
| .collect::<Result<Vec<_>>>()? | ||
| .into_iter() | ||
| .into_id_map(process_ids) |
There was a problem hiding this comment.
If you do it this way, you can avoid allocating a Vec just to throw it away again:
| .collect::<Result<Vec<_>>>()? | |
| .into_iter() | |
| .into_id_map(process_ids) | |
| .process_results(|iter| iter.into_id_map(process_ids))? |
Removes the unused read_csv_grouped_by_id function. Adds a comment to read_process_flows_from_iter so it's clear what it's doing. Refactors code to makes use of process_results.
|
Thanks both. I'll merge at the end of the day unless I hear from you regarding latest commit. |

Description
This PR introduces a new struct for raw process flows, and modifying the
ProcessFlowstruct to use a reference-countedCommodityobject.Fixes #166
Type of change
Key checklist
$ cargo test$ cargo docFurther checks