[Unity][Analysis] Dataflow analysis framework and liveness analysis #15689

slyubomirsky · 2023-09-06T21:44:18Z

As part of #15319, this PR implements liveness analysis, which is implemented using a dataflow analysis framework similar to that described by Adrian Sampson in these lecture notes: https://www.cs.cornell.edu/courses/cs6120/2020fa/lesson/4/. See also Chapter 5 of Static Program Analysis by Møller and Schwartzbach.

This required, for good or ill, introducing quite a bit of infrastructure. Here is a summary:

A control flow graph representation, representing a function in terms of bindings (which maps closely to the notion of bindings within Relax's AST, but we also need to represent the body field of SeqExprs and we have to consider the condition, true branch, false branch, and the merge point at the end of an If node, so the mapping is not one-to-one).
An implementation of the general dataflow framework from the sources listed.
An implementation of liveness analysis using Sampson's approach (which ended up being very compact thanks to the dataflow framework). The liveness analysis results are given in terms of a set of live variables per node in the control flow graph. A helper function is included for mapping between the AST and the CFG.

I highly encourage review, since I am a bit nervous about introducing new infrastructure; I am especially concerned about having clear naming (I am worried about having a name so close to the dataflow pattern matching). I felt it was unavoidable in this case, because there does not seem to be any other way for us to annotate program locations with liveness information. However, I am very glad to have implemented the general dataflow framework, since it could also be used for alias analysis and likely further analyses down the line.

Lunderberg

I really like this functionality, and there's enough partial implementations of it floating around that it would be good to consolidate them together.

The implementation looks pretty solid to me, with most of my comments focusing on usability and extensibility.

Lunderberg · 2023-10-02T18:26:11Z

include/tvm/relax/dataflow_analysis.h

+ *  2. The condition expression in an If node (a "split" point)
+ *  3. A merge point (the variable to which an If node is bound: it is a "merge" between
+ *     the SeqExprs in the true and false branches)
+ *  4. The body expression in a SeqExpr (not actually bound)


Does this imply that the analysis can only be applied to normalized relax expressions? If non-normalized, the body of a SeqExpr is bound to a variable in the containing BindingBlock or DataflowBlock.

Yes, the expectation is for expressions to be normalized first.

include/tvm/relax/dataflow_analysis.h

Lunderberg · 2023-10-02T18:44:55Z

include/tvm/relax/analysis.h

+ *   so use `ExtractCFG` and `GetBindingIndex` to match locations in `fn`
+ *   to indices in the result.
+ */
+Array<Array<Var>> LivenessAnalysis(const Function& fn);


Instead of Array<Array<Var>>, could we return Map<Var, Array<Var>>? That is, a map from the variable being bound to the list of variables that are live while the value of the variable is being computed. That would avoid requiring a user of the function to know the internal indexing scheme, and most of the APIs have easy access to the Var (e.g. In a mutator that implements ExprMutator::VisitBinding).

Good idea, I'll see about trying it.

I've looked into this and what's tricky is dealing with cases like SeqExpr body values and If conditions and merge/split points. The body does not have a var associated with it and in the case of Ifs, there is more than one CFG entry associated with one binding. Var alone would not be a good key, which is why I had the indices in the first place. We could use a CFGKey object that contains this auxiliary info or otherwise keep a data structure around for the reverse mapping, potentially.

Lunderberg · 2023-10-02T18:47:14Z

src/relax/analysis/dataflow_analysis.cc

+  // This is an inefficient linear scan; it could be improved by keeping a map of
+  // SeqExprs to indices in the CFG data structure.
+  // That should be considered if this function poses performance issues (unlikely).
+  for (size_t i = 0; i < cfg->bindings.size(); i++) {


Another advantage of using a relax::Var to specify the location at which that variable is being bound, we would avoid the possible future problem of the linear scan being inefficient.

Lunderberg · 2023-10-02T18:57:41Z

include/tvm/relax/dataflow_analysis.h

+ *   each binding in the CFG) and the second being the "output map" (the domain
+ *   being passed *out of* the corresponding binding)
+ */
+std::pair<Array<ObjectRef>, Array<ObjectRef>> DataflowAnalysis(


Should this function be externally exposed? From the changes in this PR on its own, it looks like an implementation detail for LivenessAnalysis, but the function signature suggests that it is intended for more general use.

I think you're right, the use would be mainly internal. It might be good to expose it for testing, or to allow for defining analyses in Python.

Lunderberg · 2023-10-02T20:09:58Z

include/tvm/relax/dataflow_analysis.h

+  TVM_DECLARE_BASE_OBJECT_INFO(ControlFlowGraphNode, Object);
+};
+
+class ControlFlowGraph : public ObjectRef {


Looking at this structure, I think it could be generalized to also represent a data-dependency graph, and most of the functionality would also carry over.

Both the predecessors in a control-flow graph are analogous to the inputs for a data-dependency graph, and both could be represented by Array<Var>.

Both the successors in a control-flow graph are analogous to the outputs of a data-dependency graph, and both could be represented by Array<Var>.

The DataflowAnalysis function would operate identically in both cases, either flowing things that are known at a specific time for a control-flow graph, or flowing things that are known about a specific value for a data-dependency graph.

What are your thoughts on generalizing the utility? I think the main drawback would be if there's a fundamental assumption made about the graph structure that only holds for one of the two, but they look like they might be similar enough to have lots of overlap.

I think a dependency graph is more specific than a CFG, so I would have to think about whether the same analyses would work on one. It's worth further thought.

src/relax/analysis/liveness.cc

Lunderberg · 2023-10-02T20:29:19Z

src/relax/analysis/dataflow_analysis.cc

+    // 1 predecessor: A branch in an If node (no merge needed)
+    // 2 predecessors: The merge block after an If node (merge needed)
+    // (Analogous for successors in backward analysis)
+    inputs->operator[](idx) = (prev.size() == 0)   ? init


Nit: If this is declared as std::vector<ObjectRef>& inputs = (forward)? out_map : in_map;, then the LHS of the assignment becomes inputs[idx] instead of inputs->operator[](idx).

slyubomirsky · 2023-11-14T22:17:13Z

Thanks for the comments, @Lunderberg. I may revisit this implementation for automatically extracting DataflowBlocks.

…re conversion for safety

slyubomirsky · 2023-12-05T04:13:52Z

It looks like it wasn't needed for #16204 thanks to CanonicalizeBindings, so I guess there's no immediate need for it still. We can continue discussing if we want this analysis for something else.

slyubomirsky requested a review from tqchen September 8, 2023 19:06

Lunderberg reviewed Oct 2, 2023

View reviewed changes

slyubomirsky mentioned this pull request Nov 14, 2023

[Unity][Transform] Replace eligible operators with in-place versions in dataflow blocks #16129

Merged

slyubomirsky added 14 commits November 27, 2023 21:08

Add control flow graph implementation

6c5d166

Implement dataflow analysis framework and add tests

f4f5de1

Correct doc comment

a3b6657

Phrase dataflow analysis per binding instead of per basic block

26b0d21

Add GetBoundValue utility function

9eaa271

Add GetBindingIndex helper function

2f9abb9

Fixing naming convention in Python dataflow analysis functions

fd7ccbf

Implement liveness analysis

5510984

Trailing newline

adb2d7d

Python style fixes

d143844

Header file style fixes

1cd9d50

Remove redundant check

bf6f016

Add liveness analysis to __init__ for analysis

171fac6

No need to distinguish between FreeVars and AllVars

4289b5a

slyubomirsky force-pushed the liveness-analysis branch from 28f3116 to 4289b5a Compare November 29, 2023 03:02

slyubomirsky added 4 commits November 28, 2023 22:03

Use enum class instead of an int for enums

44da801

Map over the results of the liveness analysis when doing data structu…

8516d23

…re conversion for safety

Use constructors for CFG structures instead of Create functions

90e09b4

Formatting

7679535

masahi force-pushed the unity branch from 7c35267 to c796f47 Compare December 18, 2023 09:57

junrushao force-pushed the unity branch 2 times, most recently from c95d45f to 45eeb8c Compare December 18, 2023 21:00

slyubomirsky mentioned this pull request Jan 17, 2024

[Unity][Tracking Issue] In-place operations #15319

Closed

4 tasks

tqchen deleted the branch apache:unity March 29, 2024 12:18

tqchen closed this Mar 29, 2024

[Unity][Analysis] Dataflow analysis framework and liveness analysis #15689

[Unity][Analysis] Dataflow analysis framework and liveness analysis #15689

Uh oh!

Conversation

slyubomirsky commented Sep 6, 2023

Uh oh!

Lunderberg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

slyubomirsky commented Nov 14, 2023

Uh oh!

slyubomirsky commented Dec 5, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants