Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions include/tvm/relax/analysis.h
Original file line number Diff line number Diff line change
Expand Up @@ -457,6 +457,18 @@ struct VarUsageInfo {
*/
VarUsageInfo CollectVarUsage(const Expr& expr);

/*!
* \brief Perform a liveness analysis on the function, indicating which variables
* are live at which location in the function.
*
* \param fn The function to be analyzed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can LivenessAnalysis be a member function of ControlFlowGraph? That way (1) the ControlFlowGraph would only need to be collected once if it is required by more than one analysis.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on whether we plan to reuse the dataflow analysis framework (I was hoping to but it doesn't sound like there's an immediate candidate for another analysis we would implement with it). If so, the CFG would be needed for many things and we probably wouldn't want to make all of them methods of the class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I usually lean toward member functions, because they are much easier for a developer to discover, and avoid requiring long names to provide enough context to a reader. It would be nice if there were something similar to the extension methods of C# or Rust, to allow the improved readability without making monolithic classes.

* \return An array of arrays of live variables per binding in the function.
* The array is indexed based on the corresponding control flow graph,
* so use `ExtractCFG` and `GetBindingIndex` to match locations in `fn`
* to indices in the result.
*/
Array<Array<Var>> LivenessAnalysis(const Function& fn);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of Array<Array<Var>>, could we return Map<Var, Array<Var>>? That is, a map from the variable being bound to the list of variables that are live while the value of the variable is being computed. That would avoid requiring a user of the function to know the internal indexing scheme, and most of the APIs have easy access to the Var (e.g. In a mutator that implements ExprMutator::VisitBinding).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I'll see about trying it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked into this and what's tricky is dealing with cases like SeqExpr body values and If conditions and merge/split points. The body does not have a var associated with it and in the case of Ifs, there is more than one CFG entry associated with one binding. Var alone would not be a good key, which is why I had the indices in the first place. We could use a CFGKey object that contains this auxiliary info or otherwise keep a data structure around for the reverse mapping, potentially.


/*!
* \brief Remove unused statements inside DataflowBlocks.
*
Expand Down
197 changes: 197 additions & 0 deletions include/tvm/relax/dataflow_analysis.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

/*!
* \file tvm/relax/dataflow_analysis.h
* \brief A reusable framework for dataflow analysis in Relax.
* Based on Adrian Sampson's course material:
* https://www.cs.cornell.edu/courses/cs6120/2020fa/lesson/4/
* Do not confuse with dataflow pattern matching (does not use this machinery)
*/

#ifndef TVM_RELAX_DATAFLOW_ANALYSIS_H_
#define TVM_RELAX_DATAFLOW_ANALYSIS_H_

#include <tvm/relax/analysis.h>
#include <tvm/relax/expr.h>
#include <tvm/runtime/object.h>

#include <utility>

namespace tvm {
namespace relax {

/*! \brief For dataflow analysis, we need to have a control flow graph.
* We will organize this graphs by bindings, which allows analyses to
* state their results for each binding in a SeqExpr.
*
* There are a few cases that have to be handled:
* 1. A normal binding (most common)ICHECK
* 2. The condition expression in an If node (a "split" point)
* 3. A merge point (the variable to which an If node is bound: it is a "merge" between
* the SeqExprs in the true and false branches)
* 4. The body expression in a SeqExpr (not actually bound)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this imply that the analysis can only be applied to normalized relax expressions? If non-normalized, the body of a SeqExpr is bound to a variable in the containing BindingBlock or DataflowBlock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the expectation is for expressions to be normalized first.

*/
enum class BindingNodeKind { kBinding = 0, kIfCond = 1, kIfMerge = 2, kSeqBody = 3 };

class GraphBindingNode : public Object {
public:
/*! \brief The SeqExpr the binding resides in. */
SeqExpr seq;

/*! \brief The arguments to the binding. Only the first binding in the graph has arguments
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a parameter in all GraphBinding nodes that is only non-empty for one of them seems a bit odd. Since this list is unique across the entire graph, can we instead move this into the ControlFlowGraphNode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember if I had a reason for putting it here, but I think you're right that it's more reasonable to put it in the CFG.

* (i.e., the function arguments). */
Array<Var> args;

/*! \brief Index of the binding block in the SeqExpr where the binding is found.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than identifying a particular binding by block_idx and binding_idx, could we instead identify it by the variable being bound? The variables are already required to be unique, and it would avoid needing to keep track of which size_t is associated with which array.

* Convention: We put the SeqExpr body at one block past the final block. */
size_t block_idx;

/*! \brief Index of the binding within the binding block corresponding to this binding.
* Convention: Both the If condition and merge are mapped to the same index.
* We use the kind to distinguish. */
size_t binding_idx;

/*! \brief The kind of binding this is. */
BindingNodeKind kind;

void VisitAttrs(tvm::AttrVisitor* v) {
v->Visit("seq", &seq);
v->Visit("args", &args);
v->Visit("block_idx", &block_idx);
v->Visit("binding_idx", &binding_idx);
v->Visit("kind", &kind);
}

static constexpr const uint32_t _type_index = TypeIndex::kDynamic;
static constexpr const char* _type_key = "relax.analysis.GraphBinding";
TVM_DECLARE_BASE_OBJECT_INFO(GraphBindingNode, Object);
};

/*! \brief Representation of a binding in the control flow graph */
class GraphBinding : public ObjectRef {
public:
/*!
* \brief Create a GraphBinding. See the docs on GraphBindingNode for further details.
*
* \param seq: The SeqExpr in which the binding resides.
* \param args: The arguments to the binding (only nonempty for the first binding:
* these will be the function arguments)
* \param block_idx: The index of the BindingBlock in the SeqExpr
* where the binding resides (for the return expression, use one past the final block).
* \param binding_idx: The index of the binding in the BindingBlock corresponding to the binding.
* \param kind: The kind of binding this is. (Used especially to distinguish If node conditions
* from the merge after the If)
*/
TVM_DLL GraphBinding(const SeqExpr& seq, const Array<Var>& args, size_t block_idx,
size_t binding_idx, BindingNodeKind kind);

TVM_DEFINE_NOTNULLABLE_OBJECT_REF_METHODS(GraphBinding, ObjectRef, GraphBindingNode);
};

/* A control flow graph corresponding to a function.
*/
class ControlFlowGraphNode : public Object {
public:
/*! \brief The bindings in the graph. 0 is the entry point. */
Array<GraphBinding> bindings;
/*! \brief The ith member is the list of predecessors (indices) to binding i in bindings. */
Array<Array<Integer>> preds;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can preds and succs be moved to the GraphBindingNode instead? That way, we make it impossible for these three lists to erroneously have mismatched sizes, and also make it immediately clear to readers which predecessors are associated with which nodes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I'm thinking on it, that would also benefit from using Var to track locations within an expression, rather than size_t indices. Each GraphBindingNode would hold Array<Var> predecessors and Array<Var> successors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if there might be a better way to set up the data structures.

/*! \brief The ith member is the list of successors (indices) to binding i in bindings. */
Array<Array<Integer>> succs;

void VisitAttrs(tvm::AttrVisitor* v) {
v->Visit("bindings", &bindings);
v->Visit("preds", &preds);
v->Visit("succs", &succs);
}

static constexpr const uint32_t _type_index = TypeIndex::kDynamic;
static constexpr const char* _type_key = "relax.analysis.ControlFlowGraph";
TVM_DECLARE_BASE_OBJECT_INFO(ControlFlowGraphNode, Object);
};

class ControlFlowGraph : public ObjectRef {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this structure, I think it could be generalized to also represent a data-dependency graph, and most of the functionality would also carry over.

  • Both the predecessors in a control-flow graph are analogous to the inputs for a data-dependency graph, and both could be represented by Array<Var>.
  • Both the successors in a control-flow graph are analogous to the outputs of a data-dependency graph, and both could be represented by Array<Var>.
  • The DataflowAnalysis function would operate identically in both cases, either flowing things that are known at a specific time for a control-flow graph, or flowing things that are known about a specific value for a data-dependency graph.

What are your thoughts on generalizing the utility? I think the main drawback would be if there's a fundamental assumption made about the graph structure that only holds for one of the two, but they look like they might be similar enough to have lots of overlap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a dependency graph is more specific than a CFG, so I would have to think about whether the same analyses would work on one. It's worth further thought.

public:
/*!
* \brief Create a ControlFlowGraph.
*
* \param bindings: The bindings in the graph
* \param preds: List of lists of predecessors to each binding.
* \param succs: List of lists of successors to each binding.
*/
TVM_DLL ControlFlowGraph(const Array<GraphBinding>& bindings, const Array<Array<Integer>>& preds,
const Array<Array<Integer>>& succs);

TVM_DEFINE_NOTNULLABLE_OBJECT_REF_METHODS(ControlFlowGraph, ObjectRef, ControlFlowGraphNode);
};

/*!
* \brief Extracts the control flow graph for a Relax function.
* \param func The function. This conversion expects it to be normalized.
* \return The control flow graph corresponding to the function.
*/
ControlFlowGraph ExtractCFG(const Function& func);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we accept a relax::Expr instead of a Function? That would allow it to be used in more cases, such as a SeqExpr generated inside a function visitor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't make a huge difference in the code, so I think we should.


/*!
* \brief Generic implementation of dataflow analysis, based on
* Adrian Sampson's course material, except binding by binding
* instead of basic block by basic block:
* https://www.cs.cornell.edu/courses/cs6120/2020fa/lesson/4/
*
* The analysis creates input and output maps (mapping binding indices to a domain),
* sets the initial input and output for each binding to the init value, and then
* performs a traversal of the CFG (BFS in this implementation, since unlike the general case,
* we do not have loops) and uses the transfer and merge function to update the inputs and
* outputs. The analysis can proceed forwards (from binding 0 onwards) or backwards (from the
* last binding back), flipping the roles of the input and output maps in the cases.
*
* \param forward Whether to perform a forward or backward analysis
* \param cfg The input control flow graph
* \param init The value corresponding to an initial domain
* \param transfer_func Given an input domain and a binding, determine the resulting domain
* \param merge_func Given a set of domains, combine them to form a single new domain
* (note: in Relax, a binding can never have more than two predecessors/successors)
*
* \return Two arrays, the first being the "input map" (domain being passed *into*
* each binding in the CFG) and the second being the "output map" (the domain
* being passed *out of* the corresponding binding)
*/
std::pair<Array<ObjectRef>, Array<ObjectRef>> DataflowAnalysis(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this function be externally exposed? From the changes in this PR on its own, it looks like an implementation detail for LivenessAnalysis, but the function signature suggests that it is intended for more general use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right, the use would be mainly internal. It might be good to expose it for testing, or to allow for defining analyses in Python.

const ControlFlowGraph& cfg, const ObjectRef& init,
std::function<ObjectRef(const GraphBinding&, const ObjectRef&)> transfer_func,
std::function<ObjectRef(const ObjectRef&, const ObjectRef&)> merge_func, bool forward = true);

/*! \brief A helper function. Given an index into a SeqExpr, give the index of the GraphBinding
* in the CFG.
*
* \param cfg The control flow graph.
* \param seq The target SeqExpr.
* \param block_idx The target block in the SeqExpr.
* Convention: Use one past the last block to indicate the SeqExpr body.
* \param binding_idx The target binding in the target block.
* \param match_cond If the RHS of the target binding is an IfExpr, then if match_cond is true,
* the returned index will be for the condition node; otherwise it will be for the merge node.
*/
size_t GetBindingIndex(const ControlFlowGraph& cfg, const SeqExpr& seq, size_t block_idx,
size_t binding_idx, bool match_cond);

} // namespace relax
} // namespace tvm
#endif // TVM_RELAX_DATAFLOW_ANALYSIS_H_
1 change: 1 addition & 0 deletions python/tvm/relax/analysis/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
get_static_type,
get_var2val,
has_reshape_pattern,
liveness_analysis,
name_to_binding,
post_order_visit,
remove_all_unused,
Expand Down
25 changes: 24 additions & 1 deletion python/tvm/relax/analysis/analysis.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
configuring the passes and scripting them in Python.
"""

from typing import Dict, List, Optional, Union, Callable
from typing import Dict, List, Optional, Set, Union, Callable
from enum import IntEnum

import tvm
Expand Down Expand Up @@ -407,6 +407,29 @@ def udchain(dfb: DataflowBlock) -> Dict[Var, List[Var]]:
return _ffi_api.udchain(dfb) # type: ignore


def liveness_analysis(func: Function) -> List[Set[Var]]:
"""
Perform a liveness analysis on the given function, returning a set of
the variables live in the given program location.

Parameters
----------
func: Function
The function to be analyzed

Returns
-------
ret: List[Set[Var]]
The set of live variables for each binding in the function.
The indexing is determined by the control flow graph, so
use `extract_cfg` and `get_binding_index` to find the index
for a given program location in the list.
"""
live_lists = _ffi_api.LivenessAnalysis(func)
# convert the lists to sets
return [set(live_list) for live_list in live_lists]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the advantage of converting the list to a set? If it is required for de-duplication, we probably should do that on the C++ side so that C++ callees also get de-duplicated outputs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The annoying thing is that there is no (to my knowledge) Set in the FFI classes, unless we use a Map with dummy values. The conversion is for convenience in Python. I guess using a Map with dummy values would suffice though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely agreed on that lack. The convention I've seen elsewhere (e.g. TIRVarsInStructInfo) is to return a de-duplicated Array from the C++ API. If there's some default ordering (e.g. first appearance in a Function), the tvm::support::OrderedSet can be useful for the implementation.



def name_to_binding(func: Function) -> Dict[str, List[Binding]]:
"""Return a map from variable name to its bindings."""
return _ffi_api.name_to_binding(func) # type: ignore
Expand Down
Loading