Refined Node execution/validation; flow variable substitution#57
Refined Node execution/validation; flow variable substitution#57
Conversation
Made all Parameters one line per argument for readability/version control. Increases number of lines for 'small' parameters, but should make any future changes clearer.
`validation` defined in parent Node class. It checks all options are valid, raising `ParameterValidationError` if not. Calls to `node.validate()` are removed from `pyworkflow` when updating/adding node or edge between Nodes. A call to `validate` occurs when the POST route is hit for updating a Node to check any changes to the configuration are valid.
Removes `__init__` from FilterNode, changes `execute` to raise `NotImplementedError()` in not present.
All Nodes we have implemented required the defined number of input nodes, so logic can be simplified to whether two values are equal.
reddigari
left a comment
There was a problem hiding this comment.
This is phenomenal refactoring both from an extensibility standpoint and general DRYness. Awesome work!
I am having a little trouble keeping all the flow variable- and options-wrangling methods straight in my head. We should think about a clear way to present the approach (within our class diagrams or elsewhere) for milestone 3 and the final defense.
|
Am I understanding correctly that local flow vars are just regular nodes in the graph, which are caught when parsing the predecessor input? Whereas global ones aren't connected to anything, but are available from within the node config form to override a particular parameter? If you could give me a high-level list of what needs to happen on the front end for this to be supported, that would help a ton! If I am understanding correctly, this includes:
|
|
The list you have hits most of the high-level needs of the front-end. I made a few in-depth comments to elaborate on these (and the endpoints to hit) in issues #64 and #65. In general, your understanding of how the flow variables work is correct. A Workflow contains:
The differentiator between the two is whether a Node has the Global variables are defined in a separate |
Factors out some duplicate code. PR #57 adds a `to_json()` method that might be able to replace the `extract_node_info()` method here. TBD.
diegostruk
left a comment
There was a problem hiding this comment.
I think this is a great refactor! I added a minor comment and after running seems like everything is good. I will keep going through the changes but I am fine merging this.
| display_name = "Flow Control" | ||
|
|
||
| def execute(self, predecessor_data, flow_vars): | ||
| return |
There was a problem hiding this comment.
should this raise a NotImplementedError instead?
There was a problem hiding this comment.
FlowNodes don't really have execute behavior like the other Nodes; their main logic lies in the get_replacement_value() method instead, so I don't think a NotImplementedError is needed here.
This also allows for the children (only the StringNode) to avoid adding empty execute() methods.
Node execution/validation
The main Node class now implements a validation method that checks all Parameters, calling their
option.validate()methods. The calls to validate withinpyworkflow/workflow.pyhave also been removed. This addresses the use case mentioned where a Node could potentially be invalid when added to the workspace. Now, Nodes are validated upon saving the configuration, or the 'node update' endpoint.Also changed is the order of operations for Node execution. The main logic in the
Workflowclass has been simplified to:The thinking behind moving input data validation and flow variable substitution into the Workflow class, is anyone developing a custom Node would not have to duplicate this work. Even for the built-in Nodes there was a lot of duplication where each Node validated input and parameters.
The
validate_input_dataand newly-namedget_execution_optionswere moved from theNodeUtilsclass into the mainNodeclass. This approach seems cleaner to me, but there may be use cases I'm not considering.Flow variable substitution
There are currently no front-end options to create or pass these flow variables around, but this PR should handle most of the back-end functionality and updates the endpoint responses for the front-end to use. Currently, flow variables work by adding the following attributes to a Node (for example
ReadCsv, written here in JSON:On execution, the back-end would substitute the
default_valuestored in the globalFlowNodewith ID "2", for whatever the current value is for "sep" in theReadCsvNode. Should the user want to revert "sep" back to whatever stored value was there, they would remove the flow variable (perhaps a checkbox, or empty selection from a dropdown?).To pull this information in to the "Node Configuration" pop-up in the front-end, the action would need to call the
GET /node/{node_id}endpoint which has been updated to return the Node's information as well as any flow variable options (all global variables, and any local variables connected as predecessors).TODO:
This PR updates all Nodes to use the
Parameterclasses @reddigari introduced in #53. OnlyRead/WriteCsvand theJoinNodehave been updated to use these Parameters during execution; the rest still pass in**self.options) which may cause issues.There is no validation to ensure a FlowNode matches the Parameter type of a given option. E.g., there are no checks to prevent a
StringNodevalue from replacing aBooleanParametervalue.Integration with front-end. Nested-forms might be the solution to provide the JSON data the back-end currently is using, but both back/front might need some changes as we integrate.