Initial FlowNode implementation for local/global flow variables by reelmatt · Pull Request #42 · PyWorkflowApp/visual-programming

reelmatt · 2020-04-05T18:41:38Z

How FlowNodes work as flow variables

Global flow variables are associated with a workflow (in KNIME, right-click on workflow -> 'Workflow Variables...') and local flow variables behave like regular nodes in the graph (with special red circle ports indicating a flow variable).

In KNIME, the two key pieces of information a flow variable provides are the 'var_name' and the 'default_value'. In the context of pyworkflow, the 'var_name' would substitute a Node option (e.g. 'filepath_or_buffer' for ReadCsvNode) and the 'default_value' would replace the value associated with that option.

FlowNodes are currently similar to any other Node, and local FlowNodes are created the same way. To create a global FlowNode, is_global: true is passed in the POST body when creating the node. Read/update/delete are performed the same way as other nodes, just at the /node/global/<node_id> endpoint. Global FlowNodes are stored in a new NetworkX Graph that is added to the Workflow and saved with the JSON output.

Existing endpoints should behave the same way, but some function signatures were changed slightly:

node.py: execute() now takes a flow_vars parameter
workflow.py: retrieve_node_data() is still a static method, but accepts a Node object as the argument instead of (workflow, node_id). This moves Node lookup out of the retrieval step and prevents cases where the same Node is retrieved twice.
workflow.py: to_graph_json() is now a static method that accepts a NetworkX graph, to work for both the 'graph' (workflow) and 'flow_vars'.

Limitations/TODO

This initial implementation should serve as a good starting point for further development. Currently, if a FlowNode is connected to a Node's in-port, the FlowNode variable will replace that option within the Node. Instead, this variable should be provided as an option the user can select when configuring the Node (i.e. use this manual value OR the value from the local flow variable).

Likewise, one can perform all CRUD operations on global FlowNodes, but they cannot be connected to actual Nodes. The /workflow/globals endpoint returns the list of currently defined global flow variables which should be provided as a drop-down for appropriate fields when configuring a Node.

This gets at @reddigari's point from #41 re: passing Node options into pandas calls as kwargs. We will probably want to investigate another approach.

There's also the question of different 'types' of ports. ReadCsvNode has 0 in-ports, but should accept 1 in-port for flow variables. The PivotNode has 1 in-port, but should have another in-port for flow variables. Is this a front-end approach where react-diagrams can prevent non-like ports from connecting? Does the back-end need to differentiate between port types or is knowing the type of Node connected enough?

Initial implementation for local/global flow variables. In KNIME, the two key pieces of information are the 'var_name' and the 'default_value'. Global vs. local will behave much the same way, so they are implemented as Nodes that can be added to the main workflow graph (local) or the new global graph (flow_vars).

Added to separate input for Node execution. Now accepts 'preceding_data' and 'flow_vars' as input to use during execution.

reddigari · 2020-04-05T18:57:06Z

@reelmatt react-diagrams allows different port subclasses, each implementing their own canLinkToPort() method --- this will let us validate connections. We can also parameterize the number of flow ports (e.g. num_flow_in), and render them on the visual node accordingly.

Awesome work here! Can't believe you got this going this so quickly.

reddigari

Not a big concern but something to keep an eye on: I must have missed when we introduced the use of Django file storage into pyworkflow (for storing and retrieving intermediate data frames on disk).

This is related to the first bug I worked around in #41 --- pyworkflow nodes need to know the absolute path of any files on disk, which were probably created with Django file storage. Obviously one fix (currently being used) is to use that API inside pyworkflow, but that's a separation of concerns issue (that would prevent us from using pyworkflow without the webserver entirely). My workaround in 21ae095 determines the absolute path from within the Django app, and passes it into the pyworkflow node factory. But I'm not sure whether that will work for all cases.

I'd say for now let's just keep it as is, but I think it's worth discussing before we get to CLI implementation.

Edit: see #43 for more thoughts on this

reddigari · 2020-04-05T21:07:53Z

pyworkflow/pyworkflow/node.py

        self.node_key = node_info.get('node_key')
        self.data = node_info.get('data')

+        self.is_global = True if node_info.get('is_global') else False


Very minor suggestion: self.is_global = node_info.get('is_global') is True (or make it a property)

Do we have a mechanism to designate a flow variable as global? If it's given as an option to be configured by the user, then this line will need to check self.options, right?

hcat-pge added 5 commits April 5, 2020 12:07

refactor: Move create_node method to node factory

0e55c2c

refactor: Change 'workflow_name' to @Property getter

c037679

Merge branch 'master' into dev/mthomas

8394990

feat: Use local FlowNodes to replace Node options

ba88a02

Added to separate input for Node execution. Now accepts 'preceding_data' and 'flow_vars' as input to use during execution.

reelmatt requested review from cesaragv, cesarnda, diegostruk, matthew-t-smith and reddigari April 5, 2020 18:41

reddigari approved these changes Apr 5, 2020

View reviewed changes

reddigari mentioned this pull request Apr 5, 2020

File naming/access in pyworkflow vs. Django vs. CLI #43

Closed

diegostruk approved these changes Apr 7, 2020

View reviewed changes

reddigari merged commit e3864d5 into master Apr 7, 2020

reelmatt deleted the dev/mthomas branch April 10, 2020 13:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial FlowNode implementation for local/global flow variables#42

Initial FlowNode implementation for local/global flow variables#42
reddigari merged 5 commits intomasterfrom
dev/mthomas

reelmatt commented Apr 5, 2020

Uh oh!

reddigari commented Apr 5, 2020

Uh oh!

reddigari left a comment •

edited

Loading

Uh oh!

reddigari Apr 5, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

reelmatt commented Apr 5, 2020

How FlowNodes work as flow variables

Limitations/TODO

Uh oh!

reddigari commented Apr 5, 2020

Uh oh!

reddigari left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

reddigari Apr 5, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

reddigari left a comment •

edited

Loading