From a356ba72d2b08240908acc2c4c0ac347ce8176ee Mon Sep 17 00:00:00 2001
From: Andrew Luo <andrew.zhao.luo@gmail.com>
Date: Tue, 20 Apr 2021 16:12:37 -0700
Subject: [PATCH 1/4] first draft of add op

---
 docs/dev/relay_add_op.rst | 376 +++++++++++++++++++++++++++++++-------
 1 file changed, 310 insertions(+), 66 deletions(-)

diff --git a/docs/dev/relay_add_op.rst b/docs/dev/relay_add_op.rst
index 0697939be162..bc5e52426bfe 100644
--- a/docs/dev/relay_add_op.rst
+++ b/docs/dev/relay_add_op.rst
@@ -15,51 +15,155 @@
     specific language governing permissions and limitations
     under the License.
 
-.. _relay-add-op:
+.. _relay-add-op: Adding an Operator to Relay
 
-Adding an Operator to Relay
 ===========================
 
-In order to use TVM operators from within the Relay IR, the
-operators need to be registered in Relay in order to ensure
-that they will be integrated into Relay's type system.
+In this document we will go over the steps needed to register a new TVM operator 
+in relay using this PR which adds a `cumulative product`_ operation as an example.  
+The PR itself builds upon another PR which adds a `cumulative sum`_ operation.
 
-Registering an operator requires three steps:
+.. _cumulative product: https://github.com/apache/tvm/pull/7722
+.. _cumulative sum: https://github.com/apache/tvm/pull/7334
 
-- Using the ``RELAY_REGISTER_OP`` macro in C++ to register the operator's arity and type information
-- Defining a C++ function to produce a call node for the operator and registering a Python API hook for the function
-- Wrapping the above Python API hook in a neater interface
+Registering a new operator requires a few steps:
 
-The file ``src/relay/op/tensor/binary.cc`` provides
-examples of the first two steps, while
-``python/tvm/relay/op/tensor.py`` gives examples of the
-last.
+1. Add an attribute node declaring fixed arguments which are known at compile time
+2. Write a type relation for your operation to integrate into Relay's type system. 
+3. Use the ``RELAY_REGISTER_OP`` macro in C++ to register the operator's arity, type, and other hints for the compiler 
+4. Write how the operator is computed 
+5. Register the compute, strategy with the relay operator
+6. Defining a C++ function to produce a call node for the operator and registering a Python API hook for the function
+7. Wrapping the above Python API hook in a neater interface
+8. Writing tests for the new relay operator 
 
-Registering an Operator
------------------------
+1. Defining an Attribute Node
+-----------------------------
+Attributes are fixed arguments which are supposed to be known at compile time. The stride and dilation of a convolution  
+operator would be an appropriate example of fields which might belong in an attribute node for a convolution operator.
 
-TVM already has an operator registry, but Relay cannot properly
-incorporate TVM operators without additional type information.
+Attributes should be defined in a file within the folder `include/tvm/relay/attrs/`_. 
 
+.. _include/tvm/relay/attrs/: https://github.com/apache/tvm/tree/main/include/tvm/relay/attrs
+
+Ultimately we want to create an operator whose interface can be seen clearly in the final python interface:
+
+.. code:: python
+
+    def cumprod(data, axis=None, dtype=None, exclusive=None):
+        """Numpy style cumprod op. Return the cumulative inclusive product of the elements along
+        a given axis.
+        Parameters
+        ----------
+        data : relay.Expr
+            The input data to the operator.
+        axis : int, optional
+            Axis along which the cumulative product is computed. The default (None) is to compute
+            the cumprod over the flattened array.
+        dtype : string, optional
+            Type of the returned array and of the accumulator in which the elements are multiplied.
+            If dtype is not specified, it defaults to the dtype of data.
+        exclusive : bool, optional
+            If true will return exclusive product in which the first element is not
+            included. In other terms, if true, the j-th output element would be
+            the product of the first (j-1) elements. Otherwise, it would be the product of
+            the first j elements. The product of zero elements will be 1.
+        Returns
+        -------
+        result : relay.Expr
+            The result has the same size as data, and the same shape as data if axis is not None.
+            If axis is None, the result is a 1-d array.
+        Examples
+        --------
+        .. code-block:: python
+            a = [[1,2,3], [4,5,6]]
+            cumprod(a)  # if axis is not provided, cumprod is done over the flattened input.
+            -> [ 1,  2,  6, 24, 120, 720]
+            cumprod(a, dtype="float32")
+            -> [  1.,  2.,  6., 24., 120., 720.]
+            cumprod(a, axis=0)  # multiply over rows for each of the 3 columns
+            -> [[1, 2, 3],
+                [4, 10, 18]]
+            cumprod(a, axis=1)
+            -> [[ 1,  2,  6],
+                [ 4,  20, 120]]
+            a = [1, 1, 1, 0, 1, 1, 0]  # a is a boolean array
+            cumprod(a, dtype=int32)  # dtype should be provided to get the expected results
+            -> [1, 1, 1, 0, 0, 0, 0]
+        """
+
+Therefore, when defining our attributes in ``include/tvm/relay/attrs/transform.h`` we choose the axis, 
+accumulation dtype, and exclusivity of the operation as appropriate fields for the struct.
+
+.. code:: c++
+
+  /*! \brief Attributes used in cumsum and cumprod operator */
+  struct ScanopAttrs : public tvm::AttrsNode<ScanopAttrs> {
+    Integer axis;
+    DataType dtype;
+    Bool exclusive = Bool(false);
+    TVM_DECLARE_ATTRS(ScanopAttrs, "relay.attrs.ScanopAttrs") {
+      TVM_ATTR_FIELD(axis).describe("The axis to operate over").set_default(NullValue<Integer>());
+      TVM_ATTR_FIELD(dtype).describe("Output data type").set_default(NullValue<DataType>());
+      TVM_ATTR_FIELD(exclusive)
+          .describe("The first element is not included")
+          .set_default(Bool(false));
+    }
+  };
+
+2. Writing a Type Relation
+--------------------------
 To allow for flexibility in registering operators and greater
 expressivity and granularity in expressing types in Relay, operators
 are typed using relations between input and output types. These relations
 are represented as functions that take in a list of input types and
 output types (any of these types may be incomplete) and return a list
-of input and output types that satisfies the relation. Essentially, a
+of input and output types that satisfies the relation. This includes shape 
+information which can be determined statically at compile time. Essentially, a
 relation for an operator can enforce all the necessary typing rules
 (namely by inspecting the input types) in addition to computing the
 output type.
 
-For example, see ``src/relay/op/type_relations.h`` and their
-implementations. E.g., ``BroadcastRel`` takes two input types and an
-output type, checks that they are all tensor types with the same underlying
-data type, and finally ensures that the shape of the output type is the
-broadcast of the input types' shapes.
+Type relation for the cumulative product and sum can be found in 
+``src/relay/op/tensor/transform.cc``:
+
+.. code:: c++
+
+    TVM_REGISTER_NODE_TYPE(ScanopAttrs);
+    bool ScanopRel(const Array<Type>& types, int num_inputs, const Attrs& attrs, const TypeReporter& reporter) {
+        // types: [data, output]
+        ICHECK_EQ(types.size(), 2) << "Expects two types, one for the input and another for the output";
+        const auto* data = types[0].as<TensorTypeNode>();
+        if (data == nullptr) {
+            ICHECK(types[0].as<IncompleteTypeNode>())
+            << "Scanop: expect input type to be TensorType but get " << types[0];
+            return false;
+        }
+
+        const auto* param = attrs.as<ScanopAttrs>();
+
+        auto dtype = param->dtype;
+        if (dtype.is_void()) {
+            dtype = data->dtype;
+        }
+
+        if (param->axis.defined()) {
+            reporter->Assign(types[1], TensorType(data->shape, dtype));
+        } else {
+            auto prod = data->shape[0];
+            for (size_t i = 1; i < data->shape.size(); ++i) {
+                prod = prod * data->shape[i];
+            }
+            reporter->Assign(types[1], TensorType({prod}, dtype));
+        }
+
+        return true;
+    }
 
-It may be necessary to add another type relation to ``type_relations.h``
-if the existing ones do not capture the behavior of the desired operator.
+3. Relating the Arity and Attributes to an Operation
+----------------------------------------------------
 
+We then register the name of our new ops and annotate them with the calling interface.
 The ``RELAY_REGISTER_OP`` macro in C++ allows a developer
 to specify the following information about an operator in Relay:
 
@@ -67,22 +171,153 @@ to specify the following information about an operator in Relay:
 - Names and descriptions for positional arguments
 - Support level (1 indicates an internal intrinsic; higher numbers indicate less integral or externally supported operators)
 - A type relation for the operator
+- Other annotations useful when optimizing the operation.
+
+Once again we add this to ``src/relay/op/tensor/transform.cc``:
+
+.. code:: c++
+
+    RELAY_REGISTER_OP("cumsum")
+        .describe(
+            R"doc(Return the cumulative sum of the elements along a given axis.)doc" TVM_ADD_FILELINE)
+        .set_num_inputs(1)
+        .add_argument("data", "Tensor", "The input tensor.")
+        .set_support_level(3)
+        .add_type_rel("Cumsum", ScanopRel)
+        .set_attr<TOpPattern>("TOpPattern", kOpaque);
+
+    RELAY_REGISTER_OP("cumprod")
+        .describe(
+            R"doc(Return the cumulative product of the elements along a given axis.)doc" TVM_ADD_FILELINE)
+        .set_num_inputs(1)
+        .add_argument("data", "Tensor", "The input tensor.")
+        .set_support_level(3)
+        .add_type_rel("Cumprod", ScanopRel)
+        .set_attr<TOpPattern>("TOpPattern", kOpaque);
+
+In this case the ``TOpPattern`` is a hint to the compiler on the pattern of computation which might be
+useful for reordering loops and fusing operators. ``kOpaque`` tells TVM not to not bother trying to fuse this operator. 
+
+4. Defining the Compute of the Operation
+----------------------------------------
+
+While we've now defined the interface for the operation we still have not 
+told TVM how to perform the calculation for cumulative sum and product. 
+
+Writing this code is outside the scope of the tutorial. For now, we assume
+we have a well tested implementation for the operation's compute. For 
+more details on how to do this, we recommend looking up the tutorials
+on `tensor expressions`_, `TVM's operator inventory (topi)`_ and looking at the 
+example compute for cumulative operations found in `python/tvm/topi/scan.py`_ and 
+`python/tvm/topi/cuda/scan.py`_. In the case of our cumulative sum and product operations 
+we write things directly in `TIR`_ which is the representation where tensor expressions 
+and topi will lower into.
+
+.. _tensor expressions: https://tvm.apache.org/docs/tutorials/get_started/tensor_expr_get_started.html
+.. _TVM's operator inventory (topi): https://tvm.apache.org/docs/tutorials/topi/intro_topi.html
+.. _TIR: https://tvm.apache.org/docs/dev/index.html?highlight=tir#tvm-tir
+.. _python/tvm/topi/scan.py: https://github.com/apache/tvm/blob/main/python/tvm/topi/scan.py
+.. _python/tvm/topi/cuda/scan.py: https://github.com/apache/tvm/blob/main/python/tvm/topi/cuda/scan.py
+
+5. Hooking up Compute and Strategy with Relay
+---------------------------------------------
+
+After you have implemented how your function can be computed we now need to glue it to our 
+relay operation. Within TVM this means not only defining the computation, but also the schedule 
+for an operation. A strategy is a method which picks which computation and which schedule
+to use. For example, for 2D convolutions we might recognize we are doing a depthwise convolution
+and dispatch to a more efficient computation and schedule. In our case however we have 
+no such need except for dispatching between our CPU and GPU implementations. In 
+``python/tvm/relay/op/strategy/generic.py`` and ``python/tvm/relay/op/strategy/cuda.py`` we 
+add:
 
-The below example is from ``binary.cc`` and uses a broadcasting
-add for tensors:
+.. code:: python
 
-.. code:: c
+    def wrap_compute_scanop(topi_compute):
+        """Wrap scanop style topi compute"""
+
+        def _compute_scanop(attrs, inputs, _):
+            return [topi_compute(inputs[0], attrs.axis, attrs.dtype, attrs.exclusive)]
+
+        return _compute_scanop
+
+
+    @override_native_generic_func("cumsum_strategy")
+    def cumsum_strategy(attrs, inputs, out_type, target):
+        """cumsum generic strategy"""
+        strategy = _op.OpStrategy()
+        strategy.add_implementation(
+            wrap_compute_scanop(topi.cumsum),
+            wrap_topi_schedule(topi.generic.schedule_extern),
+            name="cumsum.generic",
+        )
+        return strategy
+
+
+    @override_native_generic_func("cumprod_strategy")
+    def cumprod_strategy(attrs, inputs, out_type, target):
+        """cumprod generic strategy"""
+        strategy = _op.OpStrategy()
+        strategy.add_implementation(
+            wrap_compute_scanop(topi.cumprod),
+            wrap_topi_schedule(topi.generic.schedule_extern),
+            name="cumprod.generic",
+        )
+        return strategy
+
+    @cumsum_strategy.register(["cuda", "gpu"])
+    def cumsum_strategy_cuda(attrs, inputs, out_type, target):
+        """cumsum cuda strategy"""
+        strategy = _op.OpStrategy()
+        strategy.add_implementation(
+            wrap_compute_scanop(topi.cuda.cumsum),
+            wrap_topi_schedule(topi.cuda.schedule_scan),
+            name="cumsum.cuda",
+        )
+        return strategy
+    
+    
+    @cumprod_strategy.register(["cuda", "gpu"])
+    def cumprod_strategy_cuda(attrs, inputs, out_type, target):
+        """cumprod cuda strategy"""
+        strategy = _op.OpStrategy()
+        strategy.add_implementation(
+            wrap_compute_scanop(topi.cuda.cumprod),
+            wrap_topi_schedule(topi.cuda.schedule_scan),
+            name="cumprod.cuda",
+        )
+        return strategy
+        
+Where in each strategy we define the compute we wrote and the schedule to use.
+We finally link the strategy and compute with the defined relay operator in ``python/tvm/relay/op/_transform.py``:
+
+.. code:: python
+
+    # cumsum
+    @_reg.register_compute("cumsum")
+    def compute_cumsum(attrs, inputs, output_type):
+        """Compute definition of cumsum"""
+        return [topi.cumsum(inputs[0], attrs.axis, attrs.dtype, attrs.exclusive)]
 
-    RELAY_REGISTER_OP("add")
-        .set_num_inputs(2)
-        .add_argument("lhs", "Tensor", "The left hand side tensor.")
-        .add_argument("rhs", "Tensor", "The right hand side tensor.")
-        .set_support_level(1)
-        .add_type_rel("Broadcast", BroadcastRel);
 
-Creating a Call Node
---------------------
+    _reg.register_strategy("cumsum", strategy.cumsum_strategy)
+    _reg.register_shape_func("cumsum", False, elemwise_shape_func)
 
+    # cumprod
+    @_reg.register_compute("cumprod")
+    def compute_cumprod(attrs, inputs, output_type):
+        """Compute definition of cumprod"""
+        return [topi.cumprod(inputs[0], attrs.axis, attrs.dtype, attrs.exclusive)]
+
+
+    _reg.register_strategy("cumprod", strategy.cumprod_strategy)
+    _reg.register_shape_func("cumprod", False, elemwise_shape_func)
+
+The shape functions are used for determining output shape given a dynamically shaped tensor. In this 
+case we tell TVM the output shape will be the same as the input shape.
+
+6. Creating a Relay Call Node and Exposing a Python Hook
+--------------------------------------------------------
 This step requires simply writing a function that takes
 the arguments to the operator (as Relay expressions) and
 returning a call node to the operator (i.e., the node that
@@ -92,46 +327,50 @@ operator is intended).
 At present call attributes and type arguments (the last two fields)
 are not supported, so it suffices to use ``Op::Get`` to fetch
 the operator's information from the operator registry and pass in
-the arguments to the call node, as below.
+the arguments to the call node, as below. In ``src/relay/op/tensor/transform.cc``:
 
-.. code:: c
+.. code:: c++ 
+
+    Expr MakeCumsum(Expr data, Integer axis, DataType dtype, Bool exclusive) {
+        auto attrs = make_object<ScanopAttrs>();
+        attrs->dtype = dtype;
+        attrs->axis = axis;
+        attrs->exclusive = exclusive;
+        static const Op& op = Op::Get("cumsum");
+        return Call(op, {data}, Attrs(attrs), {});
+    }
+
+    TVM_REGISTER_GLOBAL("relay.op._make.cumsum").set_body_typed(MakeCumsum);
+
+    Expr MakeCumprod(Expr data, Integer axis, DataType dtype, Bool exclusive) {
+        auto attrs = make_object<ScanopAttrs>();
+        attrs->dtype = dtype;
+        attrs->axis = axis;
+        attrs->exclusive = exclusive;
+        static const Op& op = Op::Get("cumprod");
+        return Call(op, {data}, Attrs(attrs), {});
+    }
 
-    TVM_REGISTER_GLOBAL("relay.op._make.add")
-        .set_body_typed<Expr(Expr, Expr)>([](Expr lhs, Expr rhs) {
-            static const Op& op = Op::Get("add");
-          return Call(op, {lhs, rhs}, Attrs(), {});
-        });
+    TVM_REGISTER_GLOBAL("relay.op._make.cumsum").set_body_typed(MakeCumprod);
 
-Including a Python API Hook
----------------------------
+Where TVM_REGISTER_GLOBAL exposes the ``MakeCumsum`` and ``MakeCumprod`` functions
+in Python via ``relay.op._make.cumsum(...)`` and ``relay.op._make.cumsum(...)``.
+
+7. Including a Cleaner Python API Hook
+--------------------------------------
 
 It is generally the convention in Relay, that functions exported
 through ``TVM_REGISTER_GLOBAL`` should be wrapped in a separate
-Python function rather than called directly in Python. In the case
-of the functions that produce calls to operators, it may be convenient
-to bundle them, as in ``python/tvm/relay/op/tensor.py``, where
-elementwise operators on tensors are all provided. For example,
-the following is how the add function from the previous section is
-exposed in Python:
+Python function rather than called directly in Python. For our 
+operators we expose this cleaner interface in ``python/tvm/relay/op/transform.py``
 
 .. code:: python
 
-    def add(lhs, rhs):
-        """Elementwise addition.
+    def cumsum(data, axis=None, dtype=None, exclusive=None):
+        return _make.cumsum(data, axis, dtype, exclusive)
 
-        Parameters
-        ----------
-        lhs : relay.Expr
-            The left hand side input data
-        rhs : relay.Expr
-            The right hand side input data
-
-        Returns
-        -------
-        result : relay.Expr
-            The computed result.
-        """
-        return _make.add(lhs, rhs)
+    def cumprod(data, axis=None, dtype=None, exclusive=None):
+        return _make.cumprod(data, axis, dtype, exclusive)
 
 Note that these Python wrappers might also be good opportunities to
 provide an easier interface to the operator. For example, the
@@ -156,6 +395,11 @@ before producing the call node:
         tup = Tuple(list(args))
         return _make.concat(tup)
 
+8. Writing Unit Tests!
+----------------------
+This is self explanatory! Some example unit tests can be found in
+``tests/python/relay/test_op_level3.py``.
+
 Gradient Operators
 ------------------
 

From ef9b85557551290f97e0455ba425c8e90f4820ae Mon Sep 17 00:00:00 2001
From: Andrew Luo <andrew.zhao.luo@gmail.com>
Date: Tue, 20 Apr 2021 16:32:32 -0700
Subject: [PATCH 2/4] first pass editting doc

---
 docs/dev/relay_add_op.rst | 62 +++++++++++++++------------------------
 1 file changed, 24 insertions(+), 38 deletions(-)

diff --git a/docs/dev/relay_add_op.rst b/docs/dev/relay_add_op.rst
index bc5e52426bfe..b7b0f2ae2df4 100644
--- a/docs/dev/relay_add_op.rst
+++ b/docs/dev/relay_add_op.rst
@@ -20,7 +20,7 @@
 ===========================
 
 In this document we will go over the steps needed to register a new TVM operator 
-in relay using this PR which adds a `cumulative product`_ operation as an example.  
+in Relay. We will be following this PR which adds a `cumulative product`_ operation as an example.  
 The PR itself builds upon another PR which adds a `cumulative sum`_ operation.
 
 .. _cumulative product: https://github.com/apache/tvm/pull/7722
@@ -32,8 +32,8 @@ Registering a new operator requires a few steps:
 2. Write a type relation for your operation to integrate into Relay's type system. 
 3. Use the ``RELAY_REGISTER_OP`` macro in C++ to register the operator's arity, type, and other hints for the compiler 
 4. Write how the operator is computed 
-5. Register the compute, strategy with the relay operator
-6. Defining a C++ function to produce a call node for the operator and registering a Python API hook for the function
+5. Register the compute, schedule with the relay operator
+6. Define a C++ function to produce a call node for the operator and registering a Python API hook for the function
 7. Wrapping the above Python API hook in a neater interface
 8. Writing tests for the new relay operator 
 
@@ -73,25 +73,10 @@ Ultimately we want to create an operator whose interface can be seen clearly in
         result : relay.Expr
             The result has the same size as data, and the same shape as data if axis is not None.
             If axis is None, the result is a 1-d array.
-        Examples
-        --------
-        .. code-block:: python
-            a = [[1,2,3], [4,5,6]]
-            cumprod(a)  # if axis is not provided, cumprod is done over the flattened input.
-            -> [ 1,  2,  6, 24, 120, 720]
-            cumprod(a, dtype="float32")
-            -> [  1.,  2.,  6., 24., 120., 720.]
-            cumprod(a, axis=0)  # multiply over rows for each of the 3 columns
-            -> [[1, 2, 3],
-                [4, 10, 18]]
-            cumprod(a, axis=1)
-            -> [[ 1,  2,  6],
-                [ 4,  20, 120]]
-            a = [1, 1, 1, 0, 1, 1, 0]  # a is a boolean array
-            cumprod(a, dtype=int32)  # dtype should be provided to get the expected results
-            -> [1, 1, 1, 0, 0, 0, 0]
         """
 
+A similiar interface exists for ``cumsum()``.
+
 Therefore, when defining our attributes in ``include/tvm/relay/attrs/transform.h`` we choose the axis, 
 accumulation dtype, and exclusivity of the operation as appropriate fields for the struct.
 
@@ -124,7 +109,7 @@ relation for an operator can enforce all the necessary typing rules
 (namely by inspecting the input types) in addition to computing the
 output type.
 
-Type relation for the cumulative product and sum can be found in 
+Type relation for the cumulative product and sum operators can be found in 
 ``src/relay/op/tensor/transform.cc``:
 
 .. code:: c++
@@ -195,20 +180,20 @@ Once again we add this to ``src/relay/op/tensor/transform.cc``:
         .add_type_rel("Cumprod", ScanopRel)
         .set_attr<TOpPattern>("TOpPattern", kOpaque);
 
-In this case the ``TOpPattern`` is a hint to the compiler on the pattern of computation which might be
+In this case the ``TOpPattern`` is a hint to the compiler on the pattern of computation the operator does, which might be
 useful for reordering loops and fusing operators. ``kOpaque`` tells TVM not to not bother trying to fuse this operator. 
 
 4. Defining the Compute of the Operation
 ----------------------------------------
 
-While we've now defined the interface for the operation we still have not 
-told TVM how to perform the calculation for cumulative sum and product. 
+While we've now defined the interface for the operation but still have not 
+told TVM how to perform the actual calculations for cumulative sum and product. 
 
 Writing this code is outside the scope of the tutorial. For now, we assume
 we have a well tested implementation for the operation's compute. For 
 more details on how to do this, we recommend looking up the tutorials
 on `tensor expressions`_, `TVM's operator inventory (topi)`_ and looking at the 
-example compute for cumulative operations found in `python/tvm/topi/scan.py`_ and 
+examples cumulative sum and product found in `python/tvm/topi/scan.py`_ and 
 `python/tvm/topi/cuda/scan.py`_. In the case of our cumulative sum and product operations 
 we write things directly in `TIR`_ which is the representation where tensor expressions 
 and topi will lower into.
@@ -226,10 +211,10 @@ After you have implemented how your function can be computed we now need to glue
 relay operation. Within TVM this means not only defining the computation, but also the schedule 
 for an operation. A strategy is a method which picks which computation and which schedule
 to use. For example, for 2D convolutions we might recognize we are doing a depthwise convolution
-and dispatch to a more efficient computation and schedule. In our case however we have 
+and dispatch to a more efficient computation and schedule as a result. In our case however we have 
 no such need except for dispatching between our CPU and GPU implementations. In 
 ``python/tvm/relay/op/strategy/generic.py`` and ``python/tvm/relay/op/strategy/cuda.py`` we 
-add:
+add the following strategies:
 
 .. code:: python
 
@@ -288,7 +273,7 @@ add:
         )
         return strategy
         
-Where in each strategy we define the compute we wrote and the schedule to use.
+Where in each strategy we define the compute we wrote and the schedule to use within ``add_implementation()``.
 We finally link the strategy and compute with the defined relay operator in ``python/tvm/relay/op/_transform.py``:
 
 .. code:: python
@@ -318,7 +303,8 @@ case we tell TVM the output shape will be the same as the input shape.
 
 6. Creating a Relay Call Node and Exposing a Python Hook
 --------------------------------------------------------
-This step requires simply writing a function that takes
+We now have a working operation and now just need to properly call it 
+via a Relay Call Node. This step requires simply writing a function that takes
 the arguments to the operator (as Relay expressions) and
 returning a call node to the operator (i.e., the node that
 should be placed into the Relay AST where the call to the
@@ -398,10 +384,17 @@ before producing the call node:
 8. Writing Unit Tests!
 ----------------------
 This is self explanatory! Some example unit tests can be found in
-``tests/python/relay/test_op_level3.py``.
+`tests/python/relay/test_op_level3.py`_ for our cumulative sum 
+and product operators.
+
+.. _tests/python/relay/test_op_level3.py: https://github.com/apache/tvm/blob/main/tests/python/relay/test_op_level3.py
+
+
+Other Topics
+------------
 
 Gradient Operators
-------------------
+~~~~~~~~~~~~~~~~~~
 
 Gradient operators are important for writing differentiable programs in
 Relay. While it is the case that Relay's autodiff algorithm can differentiate
@@ -503,10 +496,3 @@ order to register the gradient.
         // Set other attributes
         // ...
         .set_attr<FPrimalGradient>("FPrimalGradient", MultiplyGrad);
-
-Summary
--------
-
-- A TVM operator can be registered in Relay using a relation to express the appropriate type information.
-- Using an operator in Relay requires a function to produce a call node for the operator.
-- It is best to have a simple Python wrapper for producing the call node.

From 0be24afaf8820885b81f8baa754db9f281afddf4 Mon Sep 17 00:00:00 2001
From: Andrew Luo <andrew.zhao.luo@gmail.com>
Date: Tue, 20 Apr 2021 16:36:18 -0700
Subject: [PATCH 3/4] make main title visible again

---
 docs/dev/relay_add_op.rst | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/docs/dev/relay_add_op.rst b/docs/dev/relay_add_op.rst
index b7b0f2ae2df4..b967face9b58 100644
--- a/docs/dev/relay_add_op.rst
+++ b/docs/dev/relay_add_op.rst
@@ -15,8 +15,9 @@
     specific language governing permissions and limitations
     under the License.
 
-.. _relay-add-op: Adding an Operator to Relay
+.. _relay-add-op: 
 
+Adding an Operator to Relay
 ===========================
 
 In this document we will go over the steps needed to register a new TVM operator 

From 6ca48b3b28025e8633ed3e446824c2628ef0fdb0 Mon Sep 17 00:00:00 2001
From: Andrew Luo <andrew.zhao.luo@gmail.com>
Date: Wed, 21 Apr 2021 11:08:53 -0700
Subject: [PATCH 4/4] address masa's comments

---
 docs/dev/relay_add_op.rst | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/docs/dev/relay_add_op.rst b/docs/dev/relay_add_op.rst
index b967face9b58..c5dce830ef00 100644
--- a/docs/dev/relay_add_op.rst
+++ b/docs/dev/relay_add_op.rst
@@ -182,22 +182,22 @@ Once again we add this to ``src/relay/op/tensor/transform.cc``:
         .set_attr<TOpPattern>("TOpPattern", kOpaque);
 
 In this case the ``TOpPattern`` is a hint to the compiler on the pattern of computation the operator does, which might be
-useful for reordering loops and fusing operators. ``kOpaque`` tells TVM not to not bother trying to fuse this operator. 
+useful for fusing operators. ``kOpaque`` tells TVM to not bother trying to fuse this operator. 
 
 4. Defining the Compute of the Operation
 ----------------------------------------
 
-While we've now defined the interface for the operation but still have not 
-told TVM how to perform the actual calculations for cumulative sum and product. 
+While we've now defined the interface for our operations we still need to define 
+how to perform the actual calculations for cumulative sum and product. 
 
 Writing this code is outside the scope of the tutorial. For now, we assume
 we have a well tested implementation for the operation's compute. For 
 more details on how to do this, we recommend looking up the tutorials
 on `tensor expressions`_, `TVM's operator inventory (topi)`_ and looking at the 
-examples cumulative sum and product found in `python/tvm/topi/scan.py`_ and 
-`python/tvm/topi/cuda/scan.py`_. In the case of our cumulative sum and product operations 
-we write things directly in `TIR`_ which is the representation where tensor expressions 
-and topi will lower into.
+example cumulative sum and product implementations found in `python/tvm/topi/scan.py`_ 
+and the gpu versions in `python/tvm/topi/cuda/scan.py`_. In the case of our cumulative 
+sum and product operations we write things directly in `TIR`_ which is the 
+representation where tensor expressions and topi will lower into.
 
 .. _tensor expressions: https://tvm.apache.org/docs/tutorials/get_started/tensor_expr_get_started.html
 .. _TVM's operator inventory (topi): https://tvm.apache.org/docs/tutorials/topi/intro_topi.html
@@ -208,7 +208,7 @@ and topi will lower into.
 5. Hooking up Compute and Strategy with Relay
 ---------------------------------------------
 
-After you have implemented how your function can be computed we now need to glue it to our 
+After you have implemented your compute function we now need to glue it to our 
 relay operation. Within TVM this means not only defining the computation, but also the schedule 
 for an operation. A strategy is a method which picks which computation and which schedule
 to use. For example, for 2D convolutions we might recognize we are doing a depthwise convolution
@@ -340,7 +340,7 @@ the arguments to the call node, as below. In ``src/relay/op/tensor/transform.cc`
 
     TVM_REGISTER_GLOBAL("relay.op._make.cumsum").set_body_typed(MakeCumprod);
 
-Where TVM_REGISTER_GLOBAL exposes the ``MakeCumsum`` and ``MakeCumprod`` functions
+Where ``TVM_REGISTER_GLOBAL`` exposes the ``MakeCumsum`` and ``MakeCumprod`` functions
 in Python via ``relay.op._make.cumsum(...)`` and ``relay.op._make.cumsum(...)``.
 
 7. Including a Cleaner Python API Hook