Skip to content

Draft a decision on nodes as outputs#56

Open
liamhuber wants to merge 4 commits intomainfrom
add_node_output
Open

Draft a decision on nodes as outputs#56
liamhuber wants to merge 4 commits intomainfrom
add_node_output

Conversation

@liamhuber
Copy link
Copy Markdown
Member

Following our experiment with decisions for pyiron, here is a summary of part of our discussion today.

For me, the line for when to use one of these instead of a regular github issue is quite blurry. At the moment, I'm focusing on things where I feel there is some risk that I misunderstood the conversation, and/or where the discussion has some reasonably significant impact on architecture. @hickel, I guess these also nicely go towards your point about documenting our conversations.

@jan-janssen, I've pinged you as well for review here because I'm not actually trying to propose or make any new decision, but only to accurately represent what was already discussed and decided. For that purpose, I don't think there's a need to involve both you and Joerg, but if either signs off that it's a fair summary I think that should be sufficient.

@liamhuber liamhuber requested review from JNmpi and jan-janssen May 16, 2025 20:12
@liamhuber liamhuber changed the title Draft a decision on outputs Draft a decision on nodes as outputs May 16, 2025
@JNmpi
Copy link
Copy Markdown
Contributor

JNmpi commented Jun 3, 2025

@liamhuber, could you please provide some specification/clarification of whether the proposal applies to the GUI only (there I can easily agree) or whether it has an impact also on the backend (i.e., do you suggest that the node gets an extra property, e.g. node.output_node) for which I do not see the need.

@liamhuber
Copy link
Copy Markdown
Member Author

My understanding was that we really wanted the nodes themselves to have the additional output in the back-end model. In this way we can form a true connection between that output port and the "functional" nodes that take it.

I agree that the front-end 100% needs this. We currently get away without it at the code-level, but I feel it still improves clarity there too. In particular, if we need it for the front end, it then makes the front- and back-end to be better aligned if we do it in both places. So if we want it in the front-end, the question for me is rather "why not?" do it also in the back-end. The receivers are responsible for making a copy, so there is no real computational cost, as this port is really just holding a reference to it's owner. As far as I can see, it will only improve clarity for the functional applications.

@JNmpi
Copy link
Copy Markdown
Contributor

JNmpi commented Jun 3, 2025

Do you have a suggestion for the syntax on the back-end? It would really help me to have something concrete. Specifically, I would like to see that we do not require users to learn additional keywords/syntax.

@liamhuber
Copy link
Copy Markdown
Member Author

I think we would need to reserve a port name, e.g. every node would come with a some_node.outputs._pyiron_node. I don't think we need to specify what we name it in the decision, but we'll want something that is some combination of clear and not likely to otherwise be a return value from the node.

For the GUI users there is no new syntax to learn, because you'd see the port in the GUI -- we could easily modify the displayed name to just "node" or whatever, and even give it a different colour despite the fact it is a normal edge type. For the code users, my opinion is that one new clearly defined name is easier to learn/remember than the concept that a data output port can conditionally be carrying its owning node if it is connected to a downstream port hinted Node, i.e. explicit>implicit.

@JNmpi
Copy link
Copy Markdown
Contributor

JNmpi commented Jun 3, 2025

For the GUI users I fully agree. I am also fine to not misuse the ports for getting the node, i.e. wf.node2(input=wf.node.outputs.my_output) should give an error if input is of type node (except my_output returns a node). However, I would strongly prefer to use the node itself, rather than using a new port, i.e., wf.node2(input=wf.node) rather than wf.node2(input=wf.node.outputs._pyiron_node).

@liamhuber
Copy link
Copy Markdown
Member Author

wf.node2(input=wf.node) is anyhow already syntactic sugar for creating a connection, and I don't see why we couldn't use it for creating a node-carrying-edge. If we won't have a port handle to specify the node-carrying edge, then in this decision we need to nail down the new edge type. I.e. currently edges are (source, target, sourceHandle, targetHandle): tuple[str, str, str, str]. My gut reaction is that the most sensible thing to use is (source, target, sourceHandle, targetHandle): tuple[str, str, str | None, str | None], where None values get interpreted that it is a "special edge" straight to the node.

I'm totally fine with the wf.node2(input=wf.node) syntactic sugar, but I still advise against using the node directly in the ground truth. This creates a new class of edge which @jan-janssen was concerned about, and while it has the benefit of not using up anything in the port namespace, it's still a special code the user needs to remember and understand just as if we gave it a special string name.

@JNmpi
Copy link
Copy Markdown
Contributor

JNmpi commented Jun 4, 2025

I like the idea of extending the edge definition to port handles being None (to mark a node input or output). It is perfectly symmetric, i.e., works for inputs or outputs accepting a node and can be easily translated to the GUI view (where we have a separate port for the node passing). I also don't foresee any major issues in the graph module.

I am not sure about your second statement regarding the ground truth. The ground truth is a graph representation consisting of nodes and edges. Explicit port labels occur only in the edges, not in the nodes. Thus, if we agree on the above extension of the edge definition, I don't see any issue. The beauty of the above extension of the edge definition is that we don't need a new class for this case and that it automatically connects to our present way of expressing the graph.

@liamhuber
Copy link
Copy Markdown
Member Author

What I mean by the ground truth is that I strongly advise the underlying model to use something like ("some_source", "some_target", "pyiron_node", "target_port") rather than ("some_source", "some_target", None, "target_port"). Conceptually, I find introducing new None values in the edges to be manageable but strictly worse than just using the existing paradigm and adding a new port.

What I am really lacking at this point is understanding the special benefits of trying to do this by introducing a new concept, when we already have existing concepts that handle the task just fine? In both cases we wind up with the ability to pass entire nodes as data, and in both cases we are free to create syntactic sugar so this is handled like wf.node2(node_as_input=wf.node). But when I compare the cons, I find adding a new port to be strictly superior and much easier to implement -- so much so that I can even demo it in a notebook after altering only one line of code in the code base:

Attack 1) Adding a self-like output port to every node

Cons:

  • Takes up a port namespace users cannot leverage
  • Users need to learn this port's name

Implementation:

  • Extend the simple_workflow.Node outputs by one port
  • Modify the "use node as output" syntactic sugar code to always ignore this port
    • Alternative: switch "use node as output" syntactic sugar to refer to the "self" data instead of "regular" data for single-return nodes -- i.e. we have design freedom to choose the meaning of the syntax wf.node2(input=wf.node), although it can only mean "the node is the data" XOR "the node's single output is the data", and the one we don't choose is only available by explicit reference to its port

In-notebook demo of the concept

import pyiron_nodes as pn
import pyiron_workflow as pwf
import pyiron_workflow.simple_workflow as sw

wf = pwf.Workflow("iter_wf")
wf.n = pn.math.Add(1, 2)

wf.n.outputs = sw.Data(
    {
        sw.PORT_LABEL: wf.n.outputs.data[sw.PORT_LABEL] + ["pyiron_node"],
        sw.PORT_TYPE: wf.n.outputs.data[sw.PORT_LABEL] + [sw.Node],
        sw.PORT_VALUE: wf.n.outputs.data[sw.PORT_LABEL] + [wf.n],
        "ready": wf.n.outputs.data["ready"] + [False],
        "node": wf.n.outputs.data["node"] + [wf.n],
    },
    attribute=sw.Port
)

wf.iter = pn.executors.IterNode(wf.n.outputs.pyiron_node, "y", [1, 2, 3], store=False)
wf.run()

Above demos the node extension, and how the new channel is used
To get the demo to run, I also had to start taking care of the second implementation point by modifying simple_workflow.Node.n_out_labels very slightly:

    @property
    def n_out_labels(self):
        return len(self.outputs.data[PORT_LABEL]) - ("pyiron_node" in self.outputs.data[PORT_LABEL])

Attack 2) Using the node itself

Cons:

  • New concept for users "data travels from port-to-port except when..."
  • No handle for defining edges, need to use something else (e.g. None)
    • Not immediately compliant with JSON (de)serialization -- needs bespoke parsing
    • Users need to learn this codename
  • No explicit reference available on the node -- nodes need to use themselves
    • "use node as output" becomes strictly impossible, i.e. we must convert to the syntax that wf.node2(input=wf.node) is receiving the entire node as input and not that node's single output

Implementation:

??? I don't know. I'm confident it is possible, but the path is not immediately clear to me.

@JNmpi
Copy link
Copy Markdown
Contributor

JNmpi commented Jun 10, 2025

Thanks for summarizing your thoughts and arguments. I have a different point of view, and I feel that many of the arguments against Attack 2 can easily be overcome. Below are some thoughts:

  • If I wanted to use node, I would find it cumbersome and unintuitive to use it via node.outputs.pyiron_node (or something similar). First, it would be difficult to explain to users why they need this detour. Second, it would clutter the list of ports and behave differently in many respects, e.g., it does not need to be run first and has the same name always.

  • Users do not have to learn new code names or syntax:

    • The backend would correctly translate wf.node2(node_input=wf.node, input=wf.node) to create an edge to the input port and link node_input to the node. This is controlled by the type hints. We could enforce that users be explicit and write input=wf.node.outputs.my_output, but I would refrain from such a solution.
    • Users never have to define edges explicitly; they would never encounter None as a port label. This is fully handled by the backend. Users only define edges via the code (see above) or in the GUI, where node edges can be easily handled.
    • Thus, for users, the input port is filled with data from the source node or the source node itself.

@liamhuber
Copy link
Copy Markdown
Member Author

@JNmpi, I have attempted to reflect your post from yesterday in an update to the decision, please re-review and let me know if further changes are needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants