From 80a615cce0538621285ac7540a4fc63a8784b7b7 Mon Sep 17 00:00:00 2001
From: tqchen <tianqi.tchen@gmail.com>
Date: Tue, 27 Nov 2018 20:21:25 -0800
Subject: [PATCH 1/5] [DOCS] Introduction to Relay IR.

---
 docs/dev/index.rst       |   3 +-
 docs/dev/relay_intro.rst | 190 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 192 insertions(+), 1 deletion(-)
 create mode 100644 docs/dev/relay_intro.rst

diff --git a/docs/dev/index.rst b/docs/dev/index.rst
index c7a52c6de13b..2734a816dc68 100644
--- a/docs/dev/index.rst
+++ b/docs/dev/index.rst
@@ -12,4 +12,5 @@ In this part of documentation, we share the rationale for the specific choices m
    nnvm_json_spec
    nnvm_overview
    hybrid_script
-   relay_add_op
+   relay_intro
+   relay_add_op
\ No newline at end of file
diff --git a/docs/dev/relay_intro.rst b/docs/dev/relay_intro.rst
new file mode 100644
index 000000000000..8c5b08eeba89
--- /dev/null
+++ b/docs/dev/relay_intro.rst
@@ -0,0 +1,190 @@
+Introduction to Relay IR
+========================
+This article introduces Relay IR -- the second generation of NNVM.
+We expect readers from two kinds of background -- those who have a programming language background and deep learning
+framework developers who are familiar with the computational graph representation.
+This article is mainly written for deep learning framework developers who are familiar with the computational graph representation.
+It can also be useful for PL background readers to understand the additional rationale of the deep learning framework builders.
+
+We briefly summarize the design goal here, and will touch upon these points in the later part of the article.
+
+- Support traditional data flow style programming and transformations.
+- Support functional style scoping, let-binding and making it fully featured differentiable language.
+- Being able to allow the user to mix the two programming styles.
+
+Build Computational Graph with Relay
+------------------------------------
+Traditional deep learning frameworks use computational graphs as their intermediate representation.
+A computational graph (or data-flow graph), is a directed acyclic graph (DAG) that represent the computation.
+
+.. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/relay/dataflow.png
+    :align: center
+    :scale: 70%
+
+
+You can use Relay to build a computational(dataflow) graph. Specifically, the above code shows how to
+construct a simple two-node graph. You can find that the syntax of the example is not that different from existing
+computational graph IR like NNVMv1, with the only difference in terms of terminology:
+
+- Existing frameworks usually uses graph and subgraph
+- Relay uses function e.g. --  ``fn (%x)``, to indicate the graph
+
+Each data-flow node is a CallNode in Relay. The relay python DSL allows you to construct a data-flow quickly.
+One thing we want to highlight in the above code -- is that we explicitly constructed an Add node with
+both input points to ``%1``.  When a deep learning framework evaluates the above program, it will compute
+the nodes in topological order, and ``%1`` will only be computed once.
+While the this fact is very natural to deep learning framework builders, it is something that might
+surprise a PL folk in the first place.  If we implement a simple visitor to print out the result and
+treat the result as nested Call expression, it becomes ``log(%x) + log(%x)``.
+
+Such ambiguity is caused by different interpretation of program semantics when there is a shared node in the DAG.
+In a normal functional programming IR, nested expressions are treated as expression trees, without considering the
+fact that the ``%1`` is actually reused twice in ``%2``.
+
+Relay IR choose to be mindful of this difference. Usually, deep learning framework users build the computational
+graph in this fashion, where a DAG node reuse often occur. As a result, when we print out the Relay program in
+the text format, we print one CallNode per line and assign a temporary id ``(%1, %2)`` to each CallNode so each common
+node can be referenced in later parts of the program.
+
+Module: Support Multiple Functions(Graphs)
+------------------------------------------
+So far we have introduced how can we build a data flow graph as a function. One might naturally ask -- can we support multiple
+functions and enable them to call each other. Relay allows grouping multiple functions together in a module, the code below
+shows an example of a function calling another function.
+
+.. code::
+
+   def @muladd(%x, %y, %z) {
+     %1 = mul(%x, %y)
+     %2 = add(%x, %z)
+     %2
+   }
+   def @myfunc(%x) {
+     %1 = @muladd(%x, 1, 2)
+     %2 = @muladd(%1, 2, 3)
+     %2
+   }
+
+The Module can be viewed as a ``Map<GlobalVar, Function>``. Here GlobalVar is just an id that is used to represent the functions
+in the module. ``@muladd`` and ``@myfunc`` are GlobalVars in the above example. When a CallNode is used to call another function,
+the corresponding GlobalVar is stored in the op field of the CallNode. It contains a level of indirection -- we need to look up
+body of the called function from the modele using the corresponding GlobalVar. In this particular case, we could also directly
+store the reference to the Function as op in the CallNode. So, why do we need to introduce GlobalVar? The main reason is that
+GlobalVar decouples the definition/declaration and enables recursion and delayed declaration of the function.
+
+.. code ::
+
+  @def myfunc(%x) {
+    %1 = equal(%x, 1)
+     if (%1) {
+        %x
+     } else {
+       %2 = sub(%x, 1)
+       %3 = @myfunc(%2)
+        %4 = add(%3, %3)
+        %4
+    }
+  }
+
+In the above example, ``@myfunc`` recursively calls itself. Using GlobalVar ``@myfunc`` to represent the function avoids
+the cyclic dependency in the data structure.
+At this point, we have introduced the basic concepts in Relay. Notably, Relay has the following improvements over NNVMv1:
+
+- Succinct text format that eases debugging of writing passes.
+- First-class support for subgraphs-functions, in a joint module, this enables further chance of joint optimizations such as inlining and calling convention specification.
+- Naive front-end language interop, for example, all the data structure can be visited in python, which allows quick prototyping of optimizations in python and mixing them with c++ code.
+
+
+Let Binding and Scopes
+----------------------
+
+So far, we have introduced how to build a computational graph in the good old way used in deep learning frameworks.
+This section will talk about a new important construct introduced by Relay -- let bindings.
+
+Let binding is used in every high-level programming languages. In Relay, it is a data structure with three
+fields ``Let(var, value, body)``. When we evaluate a let expression, we first evaluate the value part, assign
+it to the var, then return the evaluated result in the body expression.
+
+You can use a sequence of let bindings to construct a logically equivalent program to a data-flow program.
+The code example below shows one program with two forms side by side.
+
+.. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/relay/dataflow_vs_func.png
+    :align: center
+    :scale: 70%
+
+
+The nested let-binding is called A-normal form, and it is commonly used as IRs in functional programming languages.
+Now, please take a close look at the AST structure. While the two programs are semantically identical
+(so are their textual representations, except that A-normal form has let prefix), there AST structure are different from each other.
+
+Since program optimizations take these AST data structures and transform them, the two different structure will
+affect the compiler code we are going to write. For example, if we want to detect a pattern ``add(log(x), y)``:
+
+- In the data-flow form, we can first access the add node, then directly look at its first arguments to see if it is a log
+- In the A-normal form, we cannot directly do the check anymore, because the first input to add is ``%v1`` -- we will need to keep a map from variable to its bound values and lookup that map, in order to know that ``%v1`` is a log.
+
+Different data structures will impact how you might write transformations, and we need to keep that in mind.
+So now, as a deep learning framework developer, you might ask, why do we need let-binding.
+Yours PL friends will always tell you that let is important -- as PL is a quite established field,
+there must be some wisdom behind that.
+
+
+Why We Might Need Let Binding
+-----------------------------
+One key usage of let binding is that it specifies the scope of computation. Let us take look at the following example,
+which does not use let binding.
+
+.. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/relay/let_scope.png
+    :align: center
+    :scale: 70%
+
+The problem comes when we try to decide where we should evaluate node ``%1``. In particular, while the text format seems
+to suggest that we should evaluate node ``%1`` outside the if checking, the AST(as shown in the picture) does not show that relation.
+Actually, a dataflow graph never defines its scope of the evaluation. This introduces some ambiguity in the semantics.
+
+This ambiguity becomes more interesting when we have closures. Consider the following program, which returns a closure.
+We don’t know where should we compute ``%1``. It can either be outside the closure, or inside the closure.
+
+.. code::
+
+  fn (%x) {
+    %1 = log(%x)
+    %2 = fn(%y) {
+      add(%y, %1)
+    }
+    %2
+  }
+
+Let binding solves this problem, as the computation of the value happens at the let node. In both programs,
+if we change ``%1 = log(%x)`` to ``let %v1 = log(%x)``, we clearly specifies the computation location to
+be outside of the if scope and closure. As you can see let-binding gives a more precise specification of the computation site
+and could be useful when we generate backend code(as such specification is in the IR).
+
+On the other hand, the data-flow form, which does not specify the scope of computation, does have its own advantages
+-- we don’t need to worry about where to put the let when we generate the code. The dataflow form also gives more freedom
+to the later passes to decide where to put the evaluation point. As a result, it might not be a bad idea to use data flow
+form of the program in the initial phases of optimizations when you find it is convenient.
+As a matter of fact, many optimizations in relay today are written to optimize dataflow programs.
+
+However, when we lower the IR to actual runtime program, we need to be precise about the scope of computation.
+In particular, we want to explicitly specify where the scope of computation should happen when we are using
+sub-functions and closures. Let-binding is used to solve this problem in later stage execution specific optimizations.
+
+
+Implication on IR Transformations
+---------------------------------
+
+Hopefully, by now you are familiar with the two kinds of representations.
+Most functional programming languages do their analysis in A-normal form, in the case of A-normal form,
+the analyzer does not need to be mindful that the expressions are DAGs.
+
+Relay choose to support both the data-flow form and let binding. We believe that it is important to let the
+framework developer choose the representation they are familiar with.
+This does, however, have some implications on how we write passes:
+
+- If you come from a data-flow background and want to handle let, keep a map of var to the expressions so you can perform lookup when encountering a var. This is a likely means a minimum change as we already need a map from expr-> transformed expression anyway. Note that this will effectively remove all the let in the program.
+- If you come from a PL background and like A-normal form, we will provide a dataflow -> A-normal form pass.
+- For PL folks, when you are implementing something (like dataflow->ANF transformation), be mindful that the expression can be DAG, and this usually means that we should visit expressions with a ``Map<Expr, Result>`` and only compute the transformed result once, so the result expression keeps the common structure.
+
+There are additional advanced concepts such as symbolic shape inference, polymorphic functions
+that are not covered by this material, you are more than welcomed to look at other materials.

From cba72f5637b1c70df315354785d07c1ed2b69b34 Mon Sep 17 00:00:00 2001
From: tqchen <tianqi.tchen@gmail.com>
Date: Wed, 28 Nov 2018 11:31:30 -0800
Subject: [PATCH 2/5] Update per review comments

---
 docs/dev/relay_intro.rst | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/docs/dev/relay_intro.rst b/docs/dev/relay_intro.rst
index 8c5b08eeba89..b0f40bf026ab 100644
--- a/docs/dev/relay_intro.rst
+++ b/docs/dev/relay_intro.rst
@@ -3,8 +3,6 @@ Introduction to Relay IR
 This article introduces Relay IR -- the second generation of NNVM.
 We expect readers from two kinds of background -- those who have a programming language background and deep learning
 framework developers who are familiar with the computational graph representation.
-This article is mainly written for deep learning framework developers who are familiar with the computational graph representation.
-It can also be useful for PL background readers to understand the additional rationale of the deep learning framework builders.
 
 We briefly summarize the design goal here, and will touch upon these points in the later part of the article.
 
@@ -15,7 +13,7 @@ We briefly summarize the design goal here, and will touch upon these points in t
 Build Computational Graph with Relay
 ------------------------------------
 Traditional deep learning frameworks use computational graphs as their intermediate representation.
-A computational graph (or data-flow graph), is a directed acyclic graph (DAG) that represent the computation.
+A computational graph (or data-flow graph), is a directed acyclic graph (DAG) that represents the computation.
 
 .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/relay/dataflow.png
     :align: center
@@ -26,12 +24,12 @@ You can use Relay to build a computational(dataflow) graph. Specifically, the ab
 construct a simple two-node graph. You can find that the syntax of the example is not that different from existing
 computational graph IR like NNVMv1, with the only difference in terms of terminology:
 
-- Existing frameworks usually uses graph and subgraph
+- Existing frameworks usually use graph and subgraph
 - Relay uses function e.g. --  ``fn (%x)``, to indicate the graph
 
 Each data-flow node is a CallNode in Relay. The relay python DSL allows you to construct a data-flow quickly.
 One thing we want to highlight in the above code -- is that we explicitly constructed an Add node with
-both input points to ``%1``.  When a deep learning framework evaluates the above program, it will compute
+both input point to ``%1``.  When a deep learning framework evaluates the above program, it will compute
 the nodes in topological order, and ``%1`` will only be computed once.
 While the this fact is very natural to deep learning framework builders, it is something that might
 surprise a PL folk in the first place.  If we implement a simple visitor to print out the result and
@@ -68,7 +66,7 @@ shows an example of a function calling another function.
 The Module can be viewed as a ``Map<GlobalVar, Function>``. Here GlobalVar is just an id that is used to represent the functions
 in the module. ``@muladd`` and ``@myfunc`` are GlobalVars in the above example. When a CallNode is used to call another function,
 the corresponding GlobalVar is stored in the op field of the CallNode. It contains a level of indirection -- we need to look up
-body of the called function from the modele using the corresponding GlobalVar. In this particular case, we could also directly
+body of the called function from the module using the corresponding GlobalVar. In this particular case, we could also directly
 store the reference to the Function as op in the CallNode. So, why do we need to introduce GlobalVar? The main reason is that
 GlobalVar decouples the definition/declaration and enables recursion and delayed declaration of the function.
 
@@ -125,7 +123,7 @@ affect the compiler code we are going to write. For example, if we want to detec
 
 Different data structures will impact how you might write transformations, and we need to keep that in mind.
 So now, as a deep learning framework developer, you might ask, why do we need let-binding.
-Yours PL friends will always tell you that let is important -- as PL is a quite established field,
+Your PL friends will always tell you that let is important -- as PL is a quite established field,
 there must be some wisdom behind that.
 
 
@@ -156,7 +154,7 @@ We don’t know where should we compute ``%1``. It can either be outside the clo
   }
 
 Let binding solves this problem, as the computation of the value happens at the let node. In both programs,
-if we change ``%1 = log(%x)`` to ``let %v1 = log(%x)``, we clearly specifies the computation location to
+if we change ``%1 = log(%x)`` to ``let %v1 = log(%x)``, we clearly specify the computation location to
 be outside of the if scope and closure. As you can see let-binding gives a more precise specification of the computation site
 and could be useful when we generate backend code(as such specification is in the IR).
 
@@ -164,7 +162,7 @@ On the other hand, the data-flow form, which does not specify the scope of compu
 -- we don’t need to worry about where to put the let when we generate the code. The dataflow form also gives more freedom
 to the later passes to decide where to put the evaluation point. As a result, it might not be a bad idea to use data flow
 form of the program in the initial phases of optimizations when you find it is convenient.
-As a matter of fact, many optimizations in relay today are written to optimize dataflow programs.
+As a matter of fact, many optimizations in Relay today are written to optimize dataflow programs.
 
 However, when we lower the IR to actual runtime program, we need to be precise about the scope of computation.
 In particular, we want to explicitly specify where the scope of computation should happen when we are using
@@ -182,7 +180,7 @@ Relay choose to support both the data-flow form and let binding. We believe that
 framework developer choose the representation they are familiar with.
 This does, however, have some implications on how we write passes:
 
-- If you come from a data-flow background and want to handle let, keep a map of var to the expressions so you can perform lookup when encountering a var. This is a likely means a minimum change as we already need a map from expr-> transformed expression anyway. Note that this will effectively remove all the let in the program.
+- If you come from a data-flow background and want to handle let, keep a map of var to the expressions so you can perform lookup when encountering a var. This likely means a minimum change as we already need a map from expr-> transformed expression anyway. Note that this will effectively remove all the let in the program.
 - If you come from a PL background and like A-normal form, we will provide a dataflow -> A-normal form pass.
 - For PL folks, when you are implementing something (like dataflow->ANF transformation), be mindful that the expression can be DAG, and this usually means that we should visit expressions with a ``Map<Expr, Result>`` and only compute the transformed result once, so the result expression keeps the common structure.
 

From 0ee317e3589ca6bb1070d46c51364641e6c63b00 Mon Sep 17 00:00:00 2001
From: tqchen <tianqi.tchen@gmail.com>
Date: Wed, 28 Nov 2018 21:38:20 -0800
Subject: [PATCH 3/5] minor fixes

---
 docs/dev/relay_intro.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/dev/relay_intro.rst b/docs/dev/relay_intro.rst
index b0f40bf026ab..d47d4579f3fe 100644
--- a/docs/dev/relay_intro.rst
+++ b/docs/dev/relay_intro.rst
@@ -113,7 +113,7 @@ The code example below shows one program with two forms side by side.
 
 The nested let-binding is called A-normal form, and it is commonly used as IRs in functional programming languages.
 Now, please take a close look at the AST structure. While the two programs are semantically identical
-(so are their textual representations, except that A-normal form has let prefix), there AST structure are different from each other.
+(so are their textual representations, except that A-normal form has let prefix), their AST structures are different from each other.
 
 Since program optimizations take these AST data structures and transform them, the two different structure will
 affect the compiler code we are going to write. For example, if we want to detect a pattern ``add(log(x), y)``:
@@ -180,7 +180,7 @@ Relay choose to support both the data-flow form and let binding. We believe that
 framework developer choose the representation they are familiar with.
 This does, however, have some implications on how we write passes:
 
-- If you come from a data-flow background and want to handle let, keep a map of var to the expressions so you can perform lookup when encountering a var. This likely means a minimum change as we already need a map from expr-> transformed expression anyway. Note that this will effectively remove all the let in the program.
+- If you come from a data-flow background and want to handle let, keep a map of var to the expressions so you can perform lookup when encountering a var. This likely means a minimum change as we already need a map from expr -> transformed expression anyway. Note that this will effectively remove all the let in the program.
 - If you come from a PL background and like A-normal form, we will provide a dataflow -> A-normal form pass.
 - For PL folks, when you are implementing something (like dataflow->ANF transformation), be mindful that the expression can be DAG, and this usually means that we should visit expressions with a ``Map<Expr, Result>`` and only compute the transformed result once, so the result expression keeps the common structure.
 

From a87b1560aa9962ea759b61a4e9138e568ed9450e Mon Sep 17 00:00:00 2001
From: tqchen <tianqi.tchen@gmail.com>
Date: Wed, 28 Nov 2018 21:41:42 -0800
Subject: [PATCH 4/5] minor fix

---
 docs/dev/relay_intro.rst | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/docs/dev/relay_intro.rst b/docs/dev/relay_intro.rst
index d47d4579f3fe..7487b12fe649 100644
--- a/docs/dev/relay_intro.rst
+++ b/docs/dev/relay_intro.rst
@@ -137,7 +137,7 @@ which does not use let binding.
     :scale: 70%
 
 The problem comes when we try to decide where we should evaluate node ``%1``. In particular, while the text format seems
-to suggest that we should evaluate node ``%1`` outside the if checking, the AST(as shown in the picture) does not show that relation.
+to suggest that we should evaluate node ``%1`` outside the if scope, the AST(as shown in the picture) does not suggest so.
 Actually, a dataflow graph never defines its scope of the evaluation. This introduces some ambiguity in the semantics.
 
 This ambiguity becomes more interesting when we have closures. Consider the following program, which returns a closure.
@@ -162,19 +162,19 @@ On the other hand, the data-flow form, which does not specify the scope of compu
 -- we don’t need to worry about where to put the let when we generate the code. The dataflow form also gives more freedom
 to the later passes to decide where to put the evaluation point. As a result, it might not be a bad idea to use data flow
 form of the program in the initial phases of optimizations when you find it is convenient.
-As a matter of fact, many optimizations in Relay today are written to optimize dataflow programs.
+Many optimizations in Relay today are written to optimize dataflow programs.
 
 However, when we lower the IR to actual runtime program, we need to be precise about the scope of computation.
 In particular, we want to explicitly specify where the scope of computation should happen when we are using
-sub-functions and closures. Let-binding is used to solve this problem in later stage execution specific optimizations.
+sub-functions and closures. Let-binding can be used to solve this problem in later stage execution specific optimizations.
 
 
 Implication on IR Transformations
 ---------------------------------
 
 Hopefully, by now you are familiar with the two kinds of representations.
-Most functional programming languages do their analysis in A-normal form, in the case of A-normal form,
-the analyzer does not need to be mindful that the expressions are DAGs.
+Most functional programming languages do their analysis in A-normal form,
+where the analyzer does not need to be mindful that the expressions are DAGs.
 
 Relay choose to support both the data-flow form and let binding. We believe that it is important to let the
 framework developer choose the representation they are familiar with.

From e0467a5207fd877f07d09aa2b20764e8f813c742 Mon Sep 17 00:00:00 2001
From: tqchen <tianqi.tchen@gmail.com>
Date: Wed, 28 Nov 2018 21:55:16 -0800
Subject: [PATCH 5/5] fix thanks to zhiics

---
 docs/dev/relay_intro.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/dev/relay_intro.rst b/docs/dev/relay_intro.rst
index 7487b12fe649..d3c83590cbb8 100644
--- a/docs/dev/relay_intro.rst
+++ b/docs/dev/relay_intro.rst
@@ -31,7 +31,7 @@ Each data-flow node is a CallNode in Relay. The relay python DSL allows you to c
 One thing we want to highlight in the above code -- is that we explicitly constructed an Add node with
 both input point to ``%1``.  When a deep learning framework evaluates the above program, it will compute
 the nodes in topological order, and ``%1`` will only be computed once.
-While the this fact is very natural to deep learning framework builders, it is something that might
+While this fact is very natural to deep learning framework builders, it is something that might
 surprise a PL folk in the first place.  If we implement a simple visitor to print out the result and
 treat the result as nested Call expression, it becomes ``log(%x) + log(%x)``.
 
@@ -118,7 +118,7 @@ Now, please take a close look at the AST structure. While the two programs are s
 Since program optimizations take these AST data structures and transform them, the two different structure will
 affect the compiler code we are going to write. For example, if we want to detect a pattern ``add(log(x), y)``:
 
-- In the data-flow form, we can first access the add node, then directly look at its first arguments to see if it is a log
+- In the data-flow form, we can first access the add node, then directly look at its first argument to see if it is a log
 - In the A-normal form, we cannot directly do the check anymore, because the first input to add is ``%v1`` -- we will need to keep a map from variable to its bound values and lookup that map, in order to know that ``%v1`` is a log.
 
 Different data structures will impact how you might write transformations, and we need to keep that in mind.