Create ProcessFlowRaw struct by TinyMarsh · Pull Request #198 · EnergySystemsModellingLab/MUSE2

TinyMarsh · 2024-11-11T17:04:02Z

Description

This PR introduces a new struct for raw process flows, and modifying the ProcessFlow struct to use a reference-counted Commodity object.

Fixes #166

Type of change

Bug fix (non-breaking change to fix an issue)
New feature (non-breaking change to add functionality)
Refactoring (non-breaking, non-functional change to improve maintainability)
Optimization (non-breaking change to speed up the code)
Breaking change (whatever its nature)
Documentation (improve or add documentation)

Key checklist

All tests pass: $ cargo test
The documentation builds and looks OK: $ cargo doc

Further checks

Code is commented, particularly in hard-to-understand areas
Tests added that prove fix is effective or that feature works

alexdewar · 2024-12-10T18:03:13Z

@TinyMarsh I was just talking with @tsmbland today about adding a ProcessFlowRaw struct as part of #248. I guess you guys should coordinate!

TinyMarsh · 2024-12-11T11:20:08Z

@tsmbland take a look at this PR; it adds the ProcessFlowRaw struct and a method for converting to ProcessFlow.

I need help with 2 things;

What sort of validation is needed?
How to actually implement the into_flow method.

Let me know if you wanna have a chat about this.

tsmbland · 2024-12-11T14:02:08Z

Just spend a while trying to understand the general import/validation process. It varies for different tables depending on how much validation is required, but in general I think it looks like this (where X could be parameter, flow, PAC etc.):

ProcessX

final struct that will be used elsewhere in the program
relates to a single row of the table
no optional types

ProcessXRaw

initial struct after reading in the data from the file
may contain optional types

ProcessXRaw.into_x

applies defaults to optional values
calls validate
outputs a ProcessX

ProcessXRaw.validate

called by into_x
performs validation on each individual struct (e.g checking values are above zero)

read_process_xs

calls read_csv to create an iterator of ProcessXRaw
passes the iterator to read_process_xs_from_iter

read_process_xs_from_iter

calls into_x for each ProcessXRaw in the iterator to create a ProcessX
organises data into a hashmap of ProcessX
performs some validation on the full dataset (e.g. ensuring no processes are missing)

I'm assuming it's similar for other modules, although I haven't looked closely.

Anyway, I found it helpful to write out the process so figured I'd share it here. I guess this is what we're working towards for ProcessFlow.

@TinyMarsh To answer your questions:

into_flow will get called by read_process_flows_from_iter, which doesn't exist yet
I've got Check that no commodity is both an input and an output of a process #248 assigned to me which I guess will be implemented in read_process_flows_from_iter (when it exists). That might be everything to be honest.

@alexdewar Is that all correct? Anything to add?

alexdewar · 2024-12-11T14:42:40Z

Thanks @tsmbland, that's v comprehensive!

I don't have much to add, but more recently I've been avoiding having separate into_x() methods etc. and I've just put everything directly into the read_x_from_iter() functions, because I found that splitting things up was making the code a bit messy. Totally up to you though.

Your read_process_flows_from_iter function should just take an iterator of ProcessFlowRaws and turn them into a HashMap<Rc<str>, Vec<ProcessFlow>> (where the key is the process ID). See example here: https://github.com/EnergySystemsModellingLab/MUSE_2.0/blob/main/src/process.rs#L329

src/process.rs

codecov · 2024-12-13T03:45:31Z

Codecov Report

Attention: Patch coverage is 98.07692% with 3 lines in your changes missing coverage. Please review.

Project coverage is 95.14%. Comparing base (19b2cf3) to head (c29a99a).
Report is 20 commits behind head on main.

Files with missing lines	Patch %	Lines
src/process.rs	98.07%	1 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #198      +/-   ##
==========================================
+ Coverage   95.10%   95.14%   +0.03%     
==========================================
  Files          13       13              
  Lines        2532     2614      +82     
  Branches     2532     2614      +82     
==========================================
+ Hits         2408     2487      +79     
- Misses         47       50       +3     
  Partials       77       77

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

TinyMarsh · 2024-12-13T09:06:11Z

Okay, I think I'm finally starting to understand how this all works. Thanks for your comprehensive summary @tsmbland!

This might be ready for a review now @alexdewar

alexdewar

Definitely along the right lines, but there are a few minor issues with the current implementation.

We want to return errors rather than panicking
It'd be better to have a separate function that just processes data without doing any I/O
Would you mind writing a little test for this? There's only one kind of error you can get atm (bad commodity ID), so you just need to test this and the success case.

src/process.rs

AdrianDAlessandro

One small comment. Agree with the tweaks Alex suggested. Looking good overall though!

src/process.rs

TinyMarsh · 2024-12-16T12:05:51Z

Just double checking some logic here; most of the read_process_xs return an iterator where the type of the iterator is a single ProcessXRaw object. This is achieved by running the read_csv function.

However with the ProcessFlowRaw logic, instead of running read_csv we run read_csv_grouped_by_id. Which, instead of returning an iterator of type ProcessFlowRaw, it returns a HashMap with process_id as a key and a vector of ProcessFlowRaw as a value.

Just double checking this is correct behaviour. I.e. there are multiple ProcessFlowRaws per process_id.

TinyMarsh · 2024-12-16T15:56:58Z

Thanks for the feedback all. This should be ready for a re-review now.

TinyMarsh · 2024-12-16T16:41:51Z

@alexdewar something in particular I wanted to get your feedback on this this potential piece of silliness:
https://github.com/EnergySystemsModellingLab/MUSE_2.0/blob/c4fdd9fda27889edc9cd4e7740c6b3451984e0e6/src/process.rs#L339

I don't like the use of remove here, since it seems like that function should have specific functionality outside of what I'm using it for. What I'm using it for here is to avoid getting a reference to an iterator of type ProcessFlowRaw, which is what I would get if I used get(), e.g.:

let iter = process_flow_raws.get(process_id).unwrap().into_iter();

I wanted to avoid getting and passing a reference to read_process_flows_from_iter() because that involved lots of reference lifetime stuff that seemed unnecessarily complicated to implement. Hopefully that makes sense.

@AdrianDAlessandro helped with this and together we landed on using remove(). Anything to add Adrian?

alexdewar

Getting there, but I still think you can put more of the processing into read_process_flows_from_iter function.

I think the general approach should be:

read_csv()
Process the results (with map())
Group the results by ID

Atm you're doing the grouping before the processing, which then makes things a bit more fiddly.

For an example, see read_process_pacs_from_iter: https://github.com/EnergySystemsModellingLab/MUSE_2.0/blob/main/src/process.rs#L329

(Though in your case you'll be able to use the into_id_map() helper rather than into_group_map()).

TinyMarsh · 2024-12-19T17:33:04Z

Ah okay, apologies for the faff here, I don't think I really understood the assignment initially.

I have refactored the logic as you described @alexdewar as the following;

fn read_process_flows_from_iter<I>(
    iter: I,
    process_ids: &HashSet<Rc<str>>,
    commodities: &HashMap<Rc<str>, Rc<Commodity>>,
) -> Result<HashMap<Rc<str>, Vec<ProcessFlow>>>
where
    I: Iterator<Item = ProcessFlowRaw>,
{
    iter.map(|flow_raw| {
        let process_id = process_ids.get_id(&flow_raw.process_id)?;
        let commodity = commodities
            .get(flow_raw.commodity_id.as_str())
            .with_context(|| format!("{} is not a valid commodity ID", &flow_raw.commodity_id))?;

        let process_flow = ProcessFlow {
            process_id: flow_raw.process_id,
            commodity: Rc::clone(commodity),
            flow: flow_raw.flow,
            flow_type: flow_raw.flow_type,
            flow_cost: flow_raw.flow_cost,
        };

        Ok((process_id, process_flow))
    })
    .into_id_map(process_ids)
}

fn read_process_flows(
    model_dir: &Path,
    process_ids: &HashSet<Rc<str>>,
    commodities: &HashMap<Rc<str>, Rc<Commodity>>,
) -> Result<HashMap<Rc<str>, Vec<ProcessFlow>>> {
    let file_path = model_dir.join(PROCESS_FLOWS_FILE_NAME);
    let process_flow_csv = read_csv(&file_path)?;
    read_process_flows_from_iter(process_flow_csv, process_ids, commodities)
        .with_context(|| input_err_msg(&file_path))
}

But I don't understand what the issue with using into_id_map is here.

… Fix tests.

TinyMarsh · 2024-12-19T18:50:23Z

Ah okay, apologies for the faff here, I don't think I really understood the assignment initially.

Never mind, I wasn't paying attention to what into_id_map is doing. There were some minor conflicts due to #273 but they should be fixed okay now.

tsmbland

Looks good as far as I can tell! I'd just add an inline comment to read_process_flows_from_iter so it's clear what it's doing.

Also, shall we delete read_csv_grouped_by_id since it's no longer used anywhere?

alexdewar

I agree with @tsmbland's comment. I think there's one other minor tweak to make (see comment), but other than that, we're good to go!

Sorry this has ended up going through so many revisions... I should have been clearer in the issue description 😟

alexdewar · 2024-12-20T10:06:54Z

src/process.rs

+    .collect::<Result<Vec<_>>>()?
+    .into_iter()
+    .into_id_map(process_ids)


If you do it this way, you can avoid allocating a Vec just to throw it away again:

Suggested change

.collect::<Result<Vec<_>>>()?

.into_iter()

.into_id_map(process_ids)

.process_results(|iter| iter.into_id_map(process_ids))?

Removes the unused read_csv_grouped_by_id function. Adds a comment to read_process_flows_from_iter so it's clear what it's doing. Refactors code to makes use of process_results.

TinyMarsh · 2024-12-20T11:06:23Z

Thanks both. I'll merge at the end of the day unless I hear from you regarding latest commit.

TinyMarsh added 2 commits November 11, 2024 17:03

Create ProcessFlowRaw struct

deec073

Merge branch 'main' into flow-commodity-field

4c3b3a3

Convert method for ProcessFlow

e30052a

TinyMarsh requested review from alexdewar and tsmbland December 11, 2024 11:15

TinyMarsh marked this pull request as ready for review December 11, 2024 11:16

tsmbland reviewed Dec 11, 2024

View reviewed changes

src/process.rs Outdated Show resolved Hide resolved

Correctly fetch Commodity and convert to ProcessFlow

31788ee

TinyMarsh requested a review from tsmbland December 13, 2024 09:50

alexdewar requested changes Dec 13, 2024

View reviewed changes

src/process.rs Outdated Show resolved Hide resolved

src/process.rs Outdated Show resolved Hide resolved

AdrianDAlessandro reviewed Dec 13, 2024

View reviewed changes

src/process.rs Outdated Show resolved Hide resolved

TinyMarsh added 3 commits December 16, 2024 14:51

move into_flow logic in to read_process_flows_from_iter

f1bab95

Add tests

d1bdc48

Consistent passing of model_dir

c4fdd9f

TinyMarsh requested review from AdrianDAlessandro and alexdewar December 16, 2024 15:57

alexdewar requested changes Dec 17, 2024

View reviewed changes

This was referenced Dec 17, 2024

Check process flow is valid number #277

Closed

Check flow cost is a valid number #278

Closed

Ensure every process has associated flows #279

Closed

Check that all PACs are inputs or outputs #273

Merged

alexdewar mentioned this pull request Dec 19, 2024

process.rs: Move CSV-related code to input/ #286

Closed

TinyMarsh added 2 commits December 19, 2024 18:20

Refactor and move processing logic into read_process_flows_from_iter.…

5f7bcf3

… Fix tests.

Merge branch 'main' into flow-commodity-field

ecd14c2

TinyMarsh requested a review from alexdewar December 19, 2024 18:50

tsmbland reviewed Dec 20, 2024

View reviewed changes

alexdewar approved these changes Dec 20, 2024

View reviewed changes

Addresses PR feedback

c29a99a

Removes the unused read_csv_grouped_by_id function. Adds a comment to read_process_flows_from_iter so it's clear what it's doing. Refactors code to makes use of process_results.

TinyMarsh merged commit 86cfaad into main Dec 20, 2024
7 checks passed

TinyMarsh deleted the flow-commodity-field branch December 20, 2024 15:14

Comments

Conversation

TinyMarsh commented Nov 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Key checklist

Further checks

Uh oh!

alexdewar commented Dec 10, 2024

Uh oh!

TinyMarsh commented Dec 11, 2024

Uh oh!

tsmbland commented Dec 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexdewar commented Dec 11, 2024

Uh oh!

Uh oh!

codecov bot commented Dec 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

TinyMarsh commented Dec 13, 2024

Uh oh!

alexdewar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

AdrianDAlessandro left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

TinyMarsh commented Dec 16, 2024

Uh oh!

TinyMarsh commented Dec 16, 2024

Uh oh!

TinyMarsh commented Dec 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexdewar left a comment

Choose a reason for hiding this comment

Uh oh!

TinyMarsh commented Dec 19, 2024

Uh oh!

TinyMarsh commented Dec 19, 2024

Uh oh!

tsmbland left a comment

Choose a reason for hiding this comment

Uh oh!

alexdewar left a comment

Choose a reason for hiding this comment

Uh oh!

alexdewar Dec 20, 2024

Choose a reason for hiding this comment

Uh oh!

TinyMarsh commented Dec 20, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

TinyMarsh commented Nov 11, 2024 •

edited

Loading

tsmbland commented Dec 11, 2024 •

edited

Loading

codecov bot commented Dec 13, 2024 •

edited

Loading

TinyMarsh commented Dec 16, 2024 •

edited

Loading