-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[Unity][Analysis] Dataflow analysis framework and liveness analysis #15689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Lunderberg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this functionality, and there's enough partial implementations of it floating around that it would be good to consolidate them together.
The implementation looks pretty solid to me, with most of my comments focusing on usability and extensibility.
| * 2. The condition expression in an If node (a "split" point) | ||
| * 3. A merge point (the variable to which an If node is bound: it is a "merge" between | ||
| * the SeqExprs in the true and false branches) | ||
| * 4. The body expression in a SeqExpr (not actually bound) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this imply that the analysis can only be applied to normalized relax expressions? If non-normalized, the body of a SeqExpr is bound to a variable in the containing BindingBlock or DataflowBlock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the expectation is for expressions to be normalized first.
| * so use `ExtractCFG` and `GetBindingIndex` to match locations in `fn` | ||
| * to indices in the result. | ||
| */ | ||
| Array<Array<Var>> LivenessAnalysis(const Function& fn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of Array<Array<Var>>, could we return Map<Var, Array<Var>>? That is, a map from the variable being bound to the list of variables that are live while the value of the variable is being computed. That would avoid requiring a user of the function to know the internal indexing scheme, and most of the APIs have easy access to the Var (e.g. In a mutator that implements ExprMutator::VisitBinding).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I'll see about trying it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've looked into this and what's tricky is dealing with cases like SeqExpr body values and If conditions and merge/split points. The body does not have a var associated with it and in the case of Ifs, there is more than one CFG entry associated with one binding. Var alone would not be a good key, which is why I had the indices in the first place. We could use a CFGKey object that contains this auxiliary info or otherwise keep a data structure around for the reverse mapping, potentially.
| // This is an inefficient linear scan; it could be improved by keeping a map of | ||
| // SeqExprs to indices in the CFG data structure. | ||
| // That should be considered if this function poses performance issues (unlikely). | ||
| for (size_t i = 0; i < cfg->bindings.size(); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another advantage of using a relax::Var to specify the location at which that variable is being bound, we would avoid the possible future problem of the linear scan being inefficient.
| * each binding in the CFG) and the second being the "output map" (the domain | ||
| * being passed *out of* the corresponding binding) | ||
| */ | ||
| std::pair<Array<ObjectRef>, Array<ObjectRef>> DataflowAnalysis( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this function be externally exposed? From the changes in this PR on its own, it looks like an implementation detail for LivenessAnalysis, but the function signature suggests that it is intended for more general use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right, the use would be mainly internal. It might be good to expose it for testing, or to allow for defining analyses in Python.
| TVM_DECLARE_BASE_OBJECT_INFO(ControlFlowGraphNode, Object); | ||
| }; | ||
|
|
||
| class ControlFlowGraph : public ObjectRef { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this structure, I think it could be generalized to also represent a data-dependency graph, and most of the functionality would also carry over.
- Both the predecessors in a control-flow graph are analogous to the inputs for a data-dependency graph, and both could be represented by
Array<Var>. - Both the successors in a control-flow graph are analogous to the outputs of a data-dependency graph, and both could be represented by
Array<Var>. - The
DataflowAnalysisfunction would operate identically in both cases, either flowing things that are known at a specific time for a control-flow graph, or flowing things that are known about a specific value for a data-dependency graph.
What are your thoughts on generalizing the utility? I think the main drawback would be if there's a fundamental assumption made about the graph structure that only holds for one of the two, but they look like they might be similar enough to have lots of overlap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a dependency graph is more specific than a CFG, so I would have to think about whether the same analyses would work on one. It's worth further thought.
| // 1 predecessor: A branch in an If node (no merge needed) | ||
| // 2 predecessors: The merge block after an If node (merge needed) | ||
| // (Analogous for successors in backward analysis) | ||
| inputs->operator[](idx) = (prev.size() == 0) ? init |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: If this is declared as std::vector<ObjectRef>& inputs = (forward)? out_map : in_map;, then the LHS of the assignment becomes inputs[idx] instead of inputs->operator[](idx).
|
Thanks for the comments, @Lunderberg. I may revisit this implementation for automatically extracting |
28f3116 to
4289b5a
Compare
|
It looks like it wasn't needed for #16204 thanks to |
c95d45f to
45eeb8c
Compare
As part of #15319, this PR implements liveness analysis, which is implemented using a dataflow analysis framework similar to that described by Adrian Sampson in these lecture notes: https://www.cs.cornell.edu/courses/cs6120/2020fa/lesson/4/. See also Chapter 5 of Static Program Analysis by Møller and Schwartzbach.
This required, for good or ill, introducing quite a bit of infrastructure. Here is a summary:
bodyfield ofSeqExprs and we have to consider the condition, true branch, false branch, and the merge point at the end of anIfnode, so the mapping is not one-to-one).I highly encourage review, since I am a bit nervous about introducing new infrastructure; I am especially concerned about having clear naming (I am worried about having a name so close to the dataflow pattern matching). I felt it was unavoidable in this case, because there does not seem to be any other way for us to annotate program locations with liveness information. However, I am very glad to have implemented the general dataflow framework, since it could also be used for alias analysis and likely further analyses down the line.