diff --git a/.flake/pkgs/hpp2plantuml.nix b/.flake/pkgs/hpp2plantuml.nix new file mode 100644 index 0000000000..d5aba814f1 --- /dev/null +++ b/.flake/pkgs/hpp2plantuml.nix @@ -0,0 +1,11 @@ +{buildPythonPackage, fetchPypi}: + +buildPythonPackage rec { + pname = "hpp2plantuml"; + version = "0.8.5"; + format = "wheel"; + src = fetchPypi { + inherit pname version format; + sha256 = "sha256-PfTJmBypI21AAK3sMojygQfrhnRqcMmVCW4dxGfDfQg="; + }; +} diff --git a/flake.nix b/flake.nix index 484e8dbe9e..70146a750c 100644 --- a/flake.nix +++ b/flake.nix @@ -39,6 +39,7 @@ { packages = { legion = pkgs.callPackage ./.flake/pkgs/legion.nix { }; + hpp2plantuml = pkgs.python3Packages.callPackage ./.flake/pkgs/hpp2plantuml.nix { }; rapidcheckFull = pkgs.symlinkJoin { name = "rapidcheckFull"; paths = (with pkgs; [ rapidcheck.out rapidcheck.dev ]); @@ -100,6 +101,7 @@ ]) (with self.packages.${system}; [ legion + hpp2plantuml rapidcheckFull doctest ]) diff --git a/lib/utils/include/utils/graph/README.md b/lib/utils/include/utils/graph/README.md index c62b2df294..25b0103f9c 100644 --- a/lib/utils/include/utils/graph/README.md +++ b/lib/utils/include/utils/graph/README.md @@ -4,8 +4,8 @@ FlexFlow's graph library very intentionally attempts to balance performance and ease of use. The graph library aims to have a very simple external interface that is highly decoupled from the underlying representations, so performance and internal implementations can be tuned and modified over time without breaking the code that uses the library. -Because FlexFlow's graphs are not on the scale of machine memory or not so large that single traversals takes nontrivial time, the graph library intentially avoids performance opportunites that would expose many of these performance aspects to user code. -Of course, there are also some optimizations that simply have not been done due to time constraints: for example, algorithms currently are able to be specialized for the underlyign representation being used, but this could be added without modifying the user-side interface. +Because FlexFlow's graphs are not on the scale of machine memory or not so large that single traversals takes nontrivial time, the graph library intentionally avoids performance opportunities that would expose many of these performance aspects to user code. +Of course, there are also some optimizations that simply have not been done due to time constraints: for example, algorithms currently are able to be specialized for the underlying representation being used, but this could be added without modifying the user-side interface. ## Usage @@ -17,7 +17,7 @@ At their core, they are as follows: - `UndirectedGraph`: at most one edge allowed between every pair of nodes, edges are undirected - `DirectedGraph`: at most one edge allowed between every ordered pair of nodes, edges are directed (i.e., have a source node and a destination node) -- `MultiDiGraph`: arbitrary numbers of edges allowed between every pair of nodes, but each must have not only source/destination nodes but also _source/destination` indices_, which serve to disambiguate different edges between the same nodes. There can exist at most one edge for every ordered tuple of source node, destination node, source index, and destination index. +- `MultiDiGraph`: arbitrary numbers of edges allowed between every pair of nodes, but each must have not only source/destination nodes but also _source/destination indices_, which serve to disambiguate different edges between the same nodes. There can exist at most one edge for every ordered tuple of source node, destination node, source index, and destination index. Examples of the different graph variants are shown below. @@ -149,6 +149,7 @@ To add an edge between two nodes `Node n1` and `Node n2` to an `UndirectedGraph In `UndirectedGraph` the order of the arguments of `add_edge` doesn't matter as edges are undirected, but the order does matter for `DiGraph` and `MultiDiGraph`. `MultiDiGraph::add_edge` takes in two additional arguments of type `NodePort`, specifying the source and destination indices. Similar to `Node`s, `NodePort`s can be generated via `g.add_node_port()`. +`NodePort:` an opaque object used within `MultiDiGraph` to disambiguate between multiple edges. `MultiDiGraph` will be able to distinguish between 2 edges that share the same source and destination as long as at at least one `NodePort` differs. Within the context of a PCG, `NodePorts` must be thought of as the various inputs and outputs of a single node. The last paragraph covered the base API used to write to graphs, but we also want to be able to read from graphs. Reading from graphs is implemented with the `query_nodes` and `query_edges` methods, which can be thought of as executing a database query over the nodes and edges of the target graph, respectively (where queries are restricted to an incredibly simple set of operations). @@ -179,6 +180,16 @@ Generally users will use underlying representations provided by the graph librar [^1]: At some point we will likely add actual runtime checks on this, but for now we rely on the user not to mess up. Currently the implementation will keep going silently until the incorrectness grows so large that something breaks/crashes. [^2]: See if you're not familiar with the term _type coercion_ +### Open, Upward, Downward + +`Open` is to be intended similarly to the topological sense: that is, a graph that contains some edges where one of the 2 nodes is not present in the graph itself. +We can further specify the "openeness" of a **directed** graph by specifying whether they are `UpwardOpen` (so some of the incoming edges are open) or `DownwardOpen` (so some of the outgoing edges are open). + +![Open graphs inheritance diagram](docs/open.svg) + +Arrows with pointed tips indicate inheritance, while arrows with square tips indicate that the pointing class has a 'cow_ptr' of the type of the pointed class. (for more info, see [cow_ptr](#cow_ptr-and-interfaces)) + + ### Labelled Graphs As nice as all of the above is, graphs without labels are mostly useless--in practice, nodes and edges represent some other system and the properties of that system (or at least a way to map the result of graph algorithms back to the underlying system) are necessary. @@ -191,6 +202,46 @@ As such, the labelled graph types provide the typical `at` method (as on `std::u [^3]: `operator[]` currently is not present because all nodes must have labels and we don't require label types to be default constructible, though some simple template programming could probably add `operator[]` support in the cases where the label types _are_ default constructible. +![Labelled Graphs Inheritance Diagram](docs/labelled.svg) + + + ## Internals -TODO @lockshaw +Most of the major graph classes in the library come in sets of 4. For a given class `GlassName` we have: +1. `ClassName` +2. `ClassNameView` +3. `IClassName` +4. `IClassNameView` + +General rules which apply to most classes: +- `ClassName` (virtually) inherits from `ClassNameView`. Similarly, `IClassName` (virtually) inherits from `IClassNameView`. +- `ClassName` has, as a member variable, a `cow_ptr` of type `IClassName`. Same holds for `ClassNameView`. +Thus, the bulk of the inheritance that actually extends functionality is present among `IClassNameView` classes. + + +### cow_ptr and Interfaces + +The reason for the existence of the `View` variants has been explained in previous sections. +The existence of the `I(nterface)` variants stems from C++'s approach to modeling polymorphism. + +C++ polymorphism is achieved at runtime through the use of [virtual functions](https://www.learncpp.com/cpp-tutorial/virtual-functions/), which allow for a single function defined on some superclass to also work correctly on its subclasses. + +To create objects with polymorphic behaviour, we use the following syntax: +`BaseClass* obj = new DerivedClass(); //or alternatives such as std::shared_ptr obj = std::make_shared();` +Any call to `obj`'s member functions are resolved at runtime (dynamic binding), with C++ calling the most derived implementation of the function. + +While this pattern works nicely, the way instantiation is done leaves the burden of memory management on the user. +To address this, graph classes store a `cow_ptr` as a member variable, which point to instances of type equal to their corresponding interface class. + +All member functions present in `ClassName` and `ClassNameView` delegate their calls to their corresponding interface classes (which implement the actual logic), meaning that these classes essentially act as wrappers to their interface counterparts. + +To create graphs within the library, we thus use the following syntax: +`BaseGraph obj = BaseGraph::create();` + +Resulting in an object that, while of type `BaseGraph`, can access at runtime the member functions defined in `DerivedGraph` + +### Virtual Inheritance +Due to the complexity of the graph library, diamond-style inheritance patterns emerge (consider, for example, the `OutputLabelledOpenMultiDiGraphView` class, which inherits from both `NodeLabelledOpenMultiDiGraphView` and `OutputLabelledMultiDiGraphView`, which in turn inherit from both `NodeLabelledMultiDiGraphView`). +In the case of a diamond inheritance pattern C++ will instantiate multiple copies of the base class whenever we instantiate a derived class. +To address this issue, we employ [Virtual Inheritance](https://en.wikipedia.org/wiki/Virtual_inheritance), which removes the ambiguity associated with the multiple copies. diff --git a/lib/utils/include/utils/graph/docs/edges.svg b/lib/utils/include/utils/graph/docs/edges.svg new file mode 100644 index 0000000000..0e01479dc2 --- /dev/null +++ b/lib/utils/include/utils/graph/docs/edges.svg @@ -0,0 +1 @@ +EdgesDiInputDiOutputDirectedEdgeInputMultiDiEdgeMultiDiEdgeMultiDiInputMultiDiOutputOutputMultiDiEdgeUndirectedEdge \ No newline at end of file diff --git a/lib/utils/include/utils/graph/docs/generate_diagram.py b/lib/utils/include/utils/graph/docs/generate_diagram.py new file mode 100644 index 0000000000..5a4fa2e456 --- /dev/null +++ b/lib/utils/include/utils/graph/docs/generate_diagram.py @@ -0,0 +1,135 @@ +''' +Script to generate a PlantUML graph for the inheritance / dependency hierarchy between the graph classes +Modify the `headers` and `selected_groups` variables to generated different diagrams +''' + +import re +from dataclasses import dataclass +from collections import defaultdict +from hpp2plantuml import CreatePlantUMLFile +import os + +@dataclass +class Component: + name: str + rawstring: str + +def clean_puml(puml : bytes) -> str: + puml = puml.decode().split('\n') + puml = filter(lambda string : all(not string.strip(' \t').startswith(char) for char in '+-#'), puml) #remove info related to class members + puml = (line.strip('\t') for line in puml) + puml = '\n'.join(puml) + puml = puml.replace(" {\n}", '') + puml = re.sub(r' <.*?<.*?>>', '', puml) #remove the templates + return puml + +def remove_enum(puml): + return puml.replace('\nenum LRDirection {\nLEFT\nRIGHT\n}\n', '') + + +def remove_namespace(puml): + pattern = r'namespace FlexFlow {([^}]*)}' + puml = re.sub(pattern, lambda x: x.group(1).strip(), puml, flags=re.DOTALL) + puml = puml.replace('FlexFlow.', '') + return puml + +def get_components(puml): + components = [] + for line in puml.split('\n'): + if 'class' in line: + name = re.sub(r'\b(?:class|abstract\s+class)\b ', '', line) + components.append(Component(name, line)) + return components + +def get_additional_cowptr_connections(components): + extra_connections = [] + names = {c.name for c in components} + for name in names: + if 'I'+name in names: + extra_connections.append(f'I{name} *-- {name}') + return extra_connections + +def get_connections(puml, includeaggregation=False): + pattern = '--' if includeaggregation else '<|--' + connections = [] + for line in puml.split('\n'): + if pattern in line: + connections.append(line) + return connections + +def filter_by_groups(groups, components): + component_classifications = defaultdict(list) + filtered_components = [] + for component in components: + for packagename in groups: + filtering_func = GROUPS[packagename] + if filtering_func(component.name): + component_classifications[packagename].append(component) + filtered_components.append(component) + break + return component_classifications, filtered_components + + +def filter_connections(connections, components): + filtered_connections = [] + component_names = {comp.name for comp in components} + for conn in connections: + parent, _, child = conn.split(' ') + if parent in component_names and child in component_names: + filtered_connections.append(conn) + return filtered_connections + +if __name__=='__main__': + + # Provide directory path(s) and selected_groups to generate the corresponding puml file + headers = ["../labelled/*.h", "../*.h"] + selected_groups = ('Labelled','Labelled.NodeLabelled','Labelled.OutputLabelled') + output_filename = 'output.puml' + + selected_groups = sorted(selected_groups, reverse=True) #to ensure that classification for subcategories is given precedence + GROUPS = { + 'Graph' : lambda comp : 'Graph' in comp, + 'Edges' : lambda comp : any(comp.endswith(pattern) for pattern in ('Input', 'Output', 'Edge')), + 'Open' : lambda comp : 'Open' in comp and 'Query' not in comp, # doesn't include Upwards or Downwards + 'Open.Upward' : lambda comp : 'Upward' in comp and 'Query' not in comp, + 'Open.Downward' : lambda comp : 'Downward' in comp and 'Query' not in comp, + 'DiGraphs.MultiDiGraphs' : lambda comp : 'MultiDiGraph' in comp, + 'DiGraphs' : lambda comp : 'DiGraph' in comp, + 'Undirected' : lambda comp : 'UndirectedGraph' in comp, + + 'Labelled' : lambda comp : 'Labelled' in comp, + 'Labelled.NodeLabelled' : lambda comp : 'NodeLabelled' in comp, + 'Labelled.OutputLabelled' : lambda comp : 'OutputLabelled' in comp + } + TEMP_FILENAME = 'generate_diagram_temp.puml' + + CreatePlantUMLFile(headers, output_file = TEMP_FILENAME) + + with open(TEMP_FILENAME, 'rb') as tempfile: + puml : bytes = tempfile.read() + os.remove(TEMP_FILENAME) + + puml = clean_puml(puml) + puml = remove_enum(puml) + puml = remove_namespace(puml) + + components = get_components(puml) + connections = get_connections(puml) + cowptr_connections = get_additional_cowptr_connections(components) + connections += cowptr_connections + + packageclassification, components = filter_by_groups(selected_groups, components) + connections = filter_connections(connections, components) + + final_puml = "" + final_puml += "@startuml\nleft to right direction\n\n" + + for packagename, components in packageclassification.items(): + component_string = '\n'.join(f'\t{c.rawstring}' for c in components) + final_puml+=f'package {packagename} {{ \n{component_string} \n}}\n\n' + + final_puml+='\n'.join(connections) + final_puml+="\n\n@enduml" + print(final_puml) + with open(output_filename, 'w') as file: + file.write(final_puml) diff --git a/lib/utils/include/utils/graph/docs/labelled.svg b/lib/utils/include/utils/graph/docs/labelled.svg new file mode 100644 index 0000000000..a439c85c04 --- /dev/null +++ b/lib/utils/include/utils/graph/docs/labelled.svg @@ -0,0 +1 @@ +LabelledNodeLabelledOutputLabelledILabelledMultiDiGraphILabelledMultiDiGraphViewLabelledMultiDiGraphLabelledMultiDiGraphViewLabelledMultiDiSubgraphViewINodeLabelledMultiDiGraphINodeLabelledMultiDiGraphViewINodeLabelledOpenMultiDiGraphINodeLabelledOpenMultiDiGraphViewNodeLabelledMultiDiGraphNodeLabelledMultiDiGraphViewNodeLabelledMultiDiSubgraphViewNodeLabelledOpenMultiDiGraphNodeLabelledOpenMultiDiGraphViewUnorderedNodeLabelledOpenMultiDiGraphIOutputLabelledMultiDiGraphIOutputLabelledMultiDiGraphViewIOutputLabelledOpenMultiDiGraphIOutputLabelledOpenMultiDiGraphViewOutputLabelledMultiDiGraphOutputLabelledMultiDiGraphViewOutputLabelledOpenMultiDiGraphOutputLabelledOpenMultiDiGraphViewOutputLabelledOpenMultiDiSubgraphViewUnorderedOutputLabelledMultiDiGraphUnorderedOutputLabelledOpenMultiDiGraphViewMultiDiGraphAsOutputLabelledViewOutputLabelledAsOutputLabelledOpen \ No newline at end of file diff --git a/lib/utils/include/utils/graph/docs/open.svg b/lib/utils/include/utils/graph/docs/open.svg new file mode 100644 index 0000000000..87766063f4 --- /dev/null +++ b/lib/utils/include/utils/graph/docs/open.svg @@ -0,0 +1 @@ +OpenDownwardUpwardAdjacencyOpenMultiDiGraphIOpenMultiDiGraphIOpenMultiDiGraphViewOpenMultiDiGraphOpenMultiDiGraphViewOpenMultiDiSubgraphViewViewMultiDiGraphAsOpenMultiDiGraphDownwardOpenMultiDiGraphDownwardOpenMultiDiGraphViewDownwardOpenMultiDiSubgraphViewIDownwardOpenMultiDiGraphIDownwardOpenMultiDiGraphViewIUpwardOpenMultiDiGraphIUpwardOpenMultiDiGraphViewUpwardOpenMultiDiGraphUpwardOpenMultiDiGraphViewUpwardOpenMultiDiSubgraphView \ No newline at end of file diff --git a/lib/utils/include/utils/graph/docs/undirected.svg b/lib/utils/include/utils/graph/docs/undirected.svg new file mode 100644 index 0000000000..f04d893a45 --- /dev/null +++ b/lib/utils/include/utils/graph/docs/undirected.svg @@ -0,0 +1 @@ +UndirectedHashmapUndirectedGraphIUndirectedGraphIUndirectedGraphViewJoinedUndirectedGraphViewUndirectedGraphUndirectedGraphViewViewDiGraphAsUndirectedGraphViewUndirectedGraphAsDiGraph \ No newline at end of file