diff --git a/docs/dev/index.rst b/docs/dev/index.rst index c7a52c6de13b..2734a816dc68 100644 --- a/docs/dev/index.rst +++ b/docs/dev/index.rst @@ -12,4 +12,5 @@ In this part of documentation, we share the rationale for the specific choices m nnvm_json_spec nnvm_overview hybrid_script - relay_add_op + relay_intro + relay_add_op \ No newline at end of file diff --git a/docs/dev/relay_intro.rst b/docs/dev/relay_intro.rst new file mode 100644 index 000000000000..d3c83590cbb8 --- /dev/null +++ b/docs/dev/relay_intro.rst @@ -0,0 +1,188 @@ +Introduction to Relay IR +======================== +This article introduces Relay IR -- the second generation of NNVM. +We expect readers from two kinds of background -- those who have a programming language background and deep learning +framework developers who are familiar with the computational graph representation. + +We briefly summarize the design goal here, and will touch upon these points in the later part of the article. + +- Support traditional data flow style programming and transformations. +- Support functional style scoping, let-binding and making it fully featured differentiable language. +- Being able to allow the user to mix the two programming styles. + +Build Computational Graph with Relay +------------------------------------ +Traditional deep learning frameworks use computational graphs as their intermediate representation. +A computational graph (or data-flow graph), is a directed acyclic graph (DAG) that represents the computation. + +.. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/relay/dataflow.png + :align: center + :scale: 70% + + +You can use Relay to build a computational(dataflow) graph. Specifically, the above code shows how to +construct a simple two-node graph. You can find that the syntax of the example is not that different from existing +computational graph IR like NNVMv1, with the only difference in terms of terminology: + +- Existing frameworks usually use graph and subgraph +- Relay uses function e.g. -- ``fn (%x)``, to indicate the graph + +Each data-flow node is a CallNode in Relay. The relay python DSL allows you to construct a data-flow quickly. +One thing we want to highlight in the above code -- is that we explicitly constructed an Add node with +both input point to ``%1``. When a deep learning framework evaluates the above program, it will compute +the nodes in topological order, and ``%1`` will only be computed once. +While this fact is very natural to deep learning framework builders, it is something that might +surprise a PL folk in the first place. If we implement a simple visitor to print out the result and +treat the result as nested Call expression, it becomes ``log(%x) + log(%x)``. + +Such ambiguity is caused by different interpretation of program semantics when there is a shared node in the DAG. +In a normal functional programming IR, nested expressions are treated as expression trees, without considering the +fact that the ``%1`` is actually reused twice in ``%2``. + +Relay IR choose to be mindful of this difference. Usually, deep learning framework users build the computational +graph in this fashion, where a DAG node reuse often occur. As a result, when we print out the Relay program in +the text format, we print one CallNode per line and assign a temporary id ``(%1, %2)`` to each CallNode so each common +node can be referenced in later parts of the program. + +Module: Support Multiple Functions(Graphs) +------------------------------------------ +So far we have introduced how can we build a data flow graph as a function. One might naturally ask -- can we support multiple +functions and enable them to call each other. Relay allows grouping multiple functions together in a module, the code below +shows an example of a function calling another function. + +.. code:: + + def @muladd(%x, %y, %z) { + %1 = mul(%x, %y) + %2 = add(%x, %z) + %2 + } + def @myfunc(%x) { + %1 = @muladd(%x, 1, 2) + %2 = @muladd(%1, 2, 3) + %2 + } + +The Module can be viewed as a ``Map``. Here GlobalVar is just an id that is used to represent the functions +in the module. ``@muladd`` and ``@myfunc`` are GlobalVars in the above example. When a CallNode is used to call another function, +the corresponding GlobalVar is stored in the op field of the CallNode. It contains a level of indirection -- we need to look up +body of the called function from the module using the corresponding GlobalVar. In this particular case, we could also directly +store the reference to the Function as op in the CallNode. So, why do we need to introduce GlobalVar? The main reason is that +GlobalVar decouples the definition/declaration and enables recursion and delayed declaration of the function. + +.. code :: + + @def myfunc(%x) { + %1 = equal(%x, 1) + if (%1) { + %x + } else { + %2 = sub(%x, 1) + %3 = @myfunc(%2) + %4 = add(%3, %3) + %4 + } + } + +In the above example, ``@myfunc`` recursively calls itself. Using GlobalVar ``@myfunc`` to represent the function avoids +the cyclic dependency in the data structure. +At this point, we have introduced the basic concepts in Relay. Notably, Relay has the following improvements over NNVMv1: + +- Succinct text format that eases debugging of writing passes. +- First-class support for subgraphs-functions, in a joint module, this enables further chance of joint optimizations such as inlining and calling convention specification. +- Naive front-end language interop, for example, all the data structure can be visited in python, which allows quick prototyping of optimizations in python and mixing them with c++ code. + + +Let Binding and Scopes +---------------------- + +So far, we have introduced how to build a computational graph in the good old way used in deep learning frameworks. +This section will talk about a new important construct introduced by Relay -- let bindings. + +Let binding is used in every high-level programming languages. In Relay, it is a data structure with three +fields ``Let(var, value, body)``. When we evaluate a let expression, we first evaluate the value part, assign +it to the var, then return the evaluated result in the body expression. + +You can use a sequence of let bindings to construct a logically equivalent program to a data-flow program. +The code example below shows one program with two forms side by side. + +.. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/relay/dataflow_vs_func.png + :align: center + :scale: 70% + + +The nested let-binding is called A-normal form, and it is commonly used as IRs in functional programming languages. +Now, please take a close look at the AST structure. While the two programs are semantically identical +(so are their textual representations, except that A-normal form has let prefix), their AST structures are different from each other. + +Since program optimizations take these AST data structures and transform them, the two different structure will +affect the compiler code we are going to write. For example, if we want to detect a pattern ``add(log(x), y)``: + +- In the data-flow form, we can first access the add node, then directly look at its first argument to see if it is a log +- In the A-normal form, we cannot directly do the check anymore, because the first input to add is ``%v1`` -- we will need to keep a map from variable to its bound values and lookup that map, in order to know that ``%v1`` is a log. + +Different data structures will impact how you might write transformations, and we need to keep that in mind. +So now, as a deep learning framework developer, you might ask, why do we need let-binding. +Your PL friends will always tell you that let is important -- as PL is a quite established field, +there must be some wisdom behind that. + + +Why We Might Need Let Binding +----------------------------- +One key usage of let binding is that it specifies the scope of computation. Let us take look at the following example, +which does not use let binding. + +.. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/relay/let_scope.png + :align: center + :scale: 70% + +The problem comes when we try to decide where we should evaluate node ``%1``. In particular, while the text format seems +to suggest that we should evaluate node ``%1`` outside the if scope, the AST(as shown in the picture) does not suggest so. +Actually, a dataflow graph never defines its scope of the evaluation. This introduces some ambiguity in the semantics. + +This ambiguity becomes more interesting when we have closures. Consider the following program, which returns a closure. +We don’t know where should we compute ``%1``. It can either be outside the closure, or inside the closure. + +.. code:: + + fn (%x) { + %1 = log(%x) + %2 = fn(%y) { + add(%y, %1) + } + %2 + } + +Let binding solves this problem, as the computation of the value happens at the let node. In both programs, +if we change ``%1 = log(%x)`` to ``let %v1 = log(%x)``, we clearly specify the computation location to +be outside of the if scope and closure. As you can see let-binding gives a more precise specification of the computation site +and could be useful when we generate backend code(as such specification is in the IR). + +On the other hand, the data-flow form, which does not specify the scope of computation, does have its own advantages +-- we don’t need to worry about where to put the let when we generate the code. The dataflow form also gives more freedom +to the later passes to decide where to put the evaluation point. As a result, it might not be a bad idea to use data flow +form of the program in the initial phases of optimizations when you find it is convenient. +Many optimizations in Relay today are written to optimize dataflow programs. + +However, when we lower the IR to actual runtime program, we need to be precise about the scope of computation. +In particular, we want to explicitly specify where the scope of computation should happen when we are using +sub-functions and closures. Let-binding can be used to solve this problem in later stage execution specific optimizations. + + +Implication on IR Transformations +--------------------------------- + +Hopefully, by now you are familiar with the two kinds of representations. +Most functional programming languages do their analysis in A-normal form, +where the analyzer does not need to be mindful that the expressions are DAGs. + +Relay choose to support both the data-flow form and let binding. We believe that it is important to let the +framework developer choose the representation they are familiar with. +This does, however, have some implications on how we write passes: + +- If you come from a data-flow background and want to handle let, keep a map of var to the expressions so you can perform lookup when encountering a var. This likely means a minimum change as we already need a map from expr -> transformed expression anyway. Note that this will effectively remove all the let in the program. +- If you come from a PL background and like A-normal form, we will provide a dataflow -> A-normal form pass. +- For PL folks, when you are implementing something (like dataflow->ANF transformation), be mindful that the expression can be DAG, and this usually means that we should visit expressions with a ``Map`` and only compute the transformed result once, so the result expression keeps the common structure. + +There are additional advanced concepts such as symbolic shape inference, polymorphic functions +that are not covered by this material, you are more than welcomed to look at other materials.