Skip to content

Conversation

@Lunderberg
Copy link
Contributor

@Lunderberg Lunderberg commented Oct 28, 2021

This RFC introduces a BufferPointer node, to be used by BufferLoad and BufferStore nodes.

Rendered markdown link

@tqchen
Copy link
Member

tqchen commented Oct 29, 2021

cc @junrushao1994 @Hzfengsy @vinx13

@tqchen tqchen added the status: need review RFC needs review label Oct 29, 2021
Copy link
Member

@Hzfengsy Hzfengsy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the RFC. IIUC, BufferPointer is somehow like the combination of Buffer and indices. It's good if we can have it, but my question is if it's necessary? As we are changing the core data structure, which is a breaking change, the cost is greater than the benefits in my opinion.

BTW, Could you please show some more design details? such as the data structure and the AST with BufferPointer? So that more people can join the discussion.

@@ -0,0 +1,116 @@
- Feature Name: (fill me in with a unique identifier, `my_awesome_feature`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, updating the name along with addressing the other points.

@spectrometerHBH
Copy link
Contributor

If we explicitly introduce a concept similar to pointer, there may be some related problems.

  • If we don't have corresponding language construct to independently define a pointer and use it, like
p = BufferPointer(A, [i, j], value_dtype="f32")

then BufferPointer is merely an IR data structure that attaches to BufferLoad/BufferStore, which benefits the developer but doesn't bring additional expressiveness for the user.

  • If we introduce such definition statements of pointer variables, then a latent problem is alias, though I can't immediately mock up an example to clearly show which effects this will bring to the IR transformation analysis.

@Lunderberg
Copy link
Contributor Author

BufferPointer is somehow like the combination of Buffer and indices. It's good if we can have it, but my question is if it's necessary? As we are changing the core data structure, which is a breaking change, the cost is greater than the benefits in my opinion.

Thank you, and that was the main concern that I had. This change would make the areas that I've been working on lately be much cleaner, but outside of that .

The breakage was minimized by keeping the existing BufferLoad/BufferStore constructors, while adding an overload that accepts BufferPointer, and the same could be added on the python side. That would improve the source backwards compatibility, though it would still break binary backwards compatibility.

BTW, Could you please show some more design details? such as the data structure and the AST with BufferPointer? So that more people can join the discussion.

Certainly. I won't be able to add them here until Monday, but I have a sample implementation at PR#9391.

If we don't have corresponding language construct to independently define a pointer and use it, ... , then BufferPointer is merely an IR data structure that attaches to BufferLoad/BufferStore, which benefits the developer but doesn't bring additional expressiveness for the user.

The current implementation does expose it, though primarily for testing purposes. At the moment, none of the code-generators handle it explicitly, so the only current use is with BufferLoad/BufferStore. It might be able to replace the builtin tvm_access_ptr, though the ease for developers was my main thought on it.

If we introduce such definition statements of pointer variables, then a latent problem is alias, though I can't immediately mock up an example to clearly show which effects this will bring to the IR transformation analysis.

Agreed. I kept to having the same buffer/indices as BufferLoad/BufferStore had, so that at least it wouldn't increase the risk. I wouldn't want to have pointer arithmetic allowed, since that would make analysis to avoid aliasing or out-of-bounds access be harder.

@vinx13
Copy link
Member

vinx13 commented Oct 29, 2021

I agree with @spectrometerHBH that changing the IR design may introduce problems. To reuse the code but BufferLoad/Store, we can introduce auxiliary functions or construct auxiliary data structure on the fly. The BufferPointer now is PrimExpr, we must be careful it doesn't appear in other places. We intentionally keep things restricted so that buffer do not alias and not addressable(e.g. no concrete ptr). During rewriting, we don't deal with pointers explicitly, I'd prefer keeping the concept of pointer abstract, to prevent potential issue of aliasing.

The `BufferLoad::pointer` and `BufferStore::pointer` could be generic
`PrimExpr`, instead of being `BufferPointer` objects. This would
require the datatype to be a handle, with an additional parameter to
indicate what is being stored. However, this

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a truncated sentence at the end of the paragraph.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for that catch, and updated. I tend to bounce between different sentences when editing, and I thought I had gone through it to catch all of these, but it looks like I missed one.

@Lunderberg
Copy link
Contributor Author

Thank you for the feedback. Summarizing, it seems the major issues are as follows. I have some follow-up questions for the different points, listed in the sub-bullets.

  • Calling the new data structure BufferPointer brings in lots of C assumptions of pointer handling, which could either confuse users if those assumptions aren't met, or would significantly complicate compile-time analysis if those assumptions are met.

    • Would renaming it to BufferAccess address these concerns, per @vinx13's suggestion here, by avoiding the comparison to C-style pointers?
  • Having the new data structure subclass from PrimExpr means that it could be used outside of just BufferLoad or BufferStore, and there's no indication at the site of the StmtExprMutator subclass that it should only return a specific data structure.

    • What if we add another functor, similar to the existing ExprMutator and StmtMutator, which visits objects whose mutated type must be identical to the original type? That is, the method signature would be BufferAccess VisitObj_(BufferAccessNode* node).
    • This would also be applicable to other cases that aren't currently handled, such as the Buffer object itself. As an example, currently a buffer's shape may be defined in terms of a tir::Var. However, tvm::tir::Substitute doesn't touch the buffer object, and so that variable in the buffer's shape doesn't get replaced.
  • Any change to TIR is a breaking change, and therefore should only be done when expanding capabilities. Ease of development alone isn't sufficient justification for changing the TIR.

    • For my own understanding, what level of backwards compatibility does TVM aim for? I wasn't able to find specific documentation on it. The current implementation wouldn't satisfy any of the conditions below, but could with small modifications satisfy condition (a). Condition (b) would be harder to meet, mostly due to the direct access of TIR node member variables, but might be possible with dummy classes and implicit type conversions.
      1. Source compatibility with external Python, but external C++ code may require updates to remain compatible.
      2. Source compatibility with external Python and C++, but external C++ code may require recompilation.
      3. Binary compatibility with external code, with new versions of libtvm.so usable as a drop-in replacement without recompilation.
      4. Bug-for-bug compatibility with previous versions.
    • This RFC was intended to make the proposed RFC#0039, which among other changes extends the use of BufferLoad/BufferStore and deprecates Load/Store, be simpler to implement. Would this TIR change be more acceptable as part of a larger change, adding functionality and impacting similar TIR nodes?

@Lunderberg
Copy link
Contributor Author

From @vinx13's suggestion on the TVM PR thread, quoting here for the discussion.

Given it is common that only buffer + indices matters in rewriting, I'd suggest refactor into RewriteBufferAccess(pair<buffer, indices>) without modifying IR itself wherever possible. We can use a pair or auxiliary data structure BufferAccess (avoid potential confusion with concrete buffer pointer) to hold the input/output. We can discuss more in the RFC

My concern there is for newcomers coming to the codebase. Utility functions are effective if somebody knows that the utility functions exist, but require that knowledge first. Making the preferred method to handle a specific task be the default way to do it, without requiring the utility function, is easier for somebody to find. That said, I agree that a utility function would be a good fallback if we don't want to change the TIR.

@vinx13
Copy link
Member

vinx13 commented Nov 2, 2021

Thank you eric for bringing up the discussion points.

The primary reason for this discussion is because of the underlying tradeoffs:

  • T0: On one hand, for analysis/rewriting that do not need to differentiate read/write relation, having a common BufferAccess class is helpful.
  • T1: On the other hand, many passes would also need the read/write information, having an explicit BufferAccess can results in implementations that forget to consider r/w relations(because of the common BufferAccess class), or other relations that follow.

So from a developer experience pov, it is hard to tell which way is clearly better. T0 would call for the introduction of BufferAccess, while T1 would favor the original AST where read/write are explicit.

Given most goals of T0 can be implemented via a common util function, the gain of T0 may not be as large. From my personal pov, considering all the factors, the clarity of read/write relation(in T1) and simplicity of AST outweights the benefits in T0.

I agree that IR change is certainly important even if it is just a developer win, as long as it is strictly better than the old design. However, because of the impact, we might need some more deliberations on the tradeoffs. This review process already helps us to uncover some of the hidden problems(such as the initial leaking issue into values by using a BufferPointer, which can be addressed by having BufferAccess as an object). I also agree that it is better to separate the TIR change from RFC0039, we just need more discussions about the design here.

I am looking forward to build better designs together :)

@Lunderberg
Copy link
Contributor Author

Good point on the tradeoffs, and I agree that this turned out to be a longer discussion that should be separate from the changes in RFC0039.

For the T1 developer experience, I've been assuming that they would still be written using visitors of the BufferLoad and BufferStore objects, and so the main difference would be accessing op->access->buffer instead of op->buffer.

@Lunderberg
Copy link
Contributor Author

Following a conversation with @vinx13, below are the overall points of discussions.

  • The improvement to writability/readability aren't as universal as I had previously thought. It improves some use cases, is neutral to others, but there are some transforms that are made somewhat worse by using BufferAccess. Examples that we found of each are shown below.

    • PrimFuncSpecializer is made much simpler by using BufferAccess. This is generally the case for transforms that replace buffers/indices.

      Current implementation (click to expand)
      Stmt VisitStmt_(const BufferStoreNode* op) final {
        Stmt stmt = StmtExprMutator::VisitStmt_(op);
        op = stmt.as<BufferStoreNode>();
        ICHECK(op != nullptr);
        auto it = buffer_map_.find(op->buffer);
        if (it == buffer_map_.end()) {
          return GetRef<BufferStore>(op);
        } else {
          auto n = CopyOnWrite(op);
          n->buffer = it->second;
          return Stmt(n);
        }
      }
      
      PrimExpr VisitExpr_(const BufferLoadNode* op) final {
        PrimExpr expr = StmtExprMutator::VisitExpr_(op);
        op = expr.as<BufferLoadNode>();
        ICHECK(op != nullptr);
        auto it = buffer_map_.find(op->buffer);
        if (it == buffer_map_.end()) {
          return GetRef<BufferLoad>(op);
        } else {
          auto n = make_object<BufferLoadNode>(*op);
          n->buffer = it->second;
          return PrimExpr(n);
        }
      }
      Cleaner implementation, current TIR with helper function (click to expand)
      Stmt VisitStmt_(const BufferStoreNode* op) final {
        BufferStore store = Downcast<BufferStore>(StmtExprMutator::VisitStmt_(op));
      
        auto writer = store.CopyOnWrite();
        ModifyBuffer(writer->buffer);
      
        return std::move(store);
      }
      
      PrimExpr VisitStmt_(const BufferLoadNode* op) final {
        BufferLoad load = Downcast<BufferLoad>(StmtExprMutator::VisitExpr_(op));
      
        auto writer = load.CopyOnWrite();
        ModifyBuffer(writer->buffer);
      
        return std::move(load);
      }
      
      void ModifyBuffer(Buffer& buffer) {
        auto it = buffer_map_.find(buffer);
        if (it != buffer_map_.end()) {
          buffer = it->second;
        }
      }
      Cleanest implementation, uses `BufferAccess` (click to expand)
      BufferAccess VisitExpr_(const BufferAccessNode* op) final {
        BufferAccess access = Downcast<BufferAccess>(StmtExprMutator::VisitExpr_(op));
      
        auto it = buffer_map_.find(op->buffer);
        if (it != buffer_map_.end()) {
          auto writer = access.CopyOnWrite();
          writer->buffer = it->second;
        }
      
        return Downcast<PrimExpr>(access);
      }
    • ComputeInliner is largely unchanged by using BufferAccess. This is largely the case for transforms that read the buffers/indices, then perform some replacement to remove the buffer use.

      Current implementation (click to expand)
      PrimExpr VisitExpr_(const BufferLoadNode* _load) final {
        BufferLoad load = Downcast<BufferLoad>(StmtExprMutator::VisitExpr_(_load));
        if (!load->buffer.same_as(inlined_buffer_)) {
          return std::move(load);
        }
        return ReplaceInlinedBuffer(std::move(load));
      }
      
      PrimExpr ReplaceInlinedBuffer(BufferLoad load) {
        SetIndexSubstitution(load->indices);
        return Substitute(inlined_store_->value, idx_sub_);
      }
      Implementation with `BufferAccess` (click to expand)
      PrimExpr VisitExpr_(const BufferLoadNode* _load) final {
        BufferLoad load = Downcast<BufferLoad>(StmtExprMutator::VisitExpr_(_load));
        if (!load->access->buffer.same_as(inlined_buffer_)) {
          return std::move(load);
        }
        return ReplaceInlinedBuffer(std::move(load));
      }
      
      PrimExpr ReplaceInlinedBuffer(BufferLoad load) {
        SetIndexSubstitution(load->access->indices);
        return Substitute(inlined_store_->value, idx_sub_);
      }
    • CacheReadRewriter is made worse by use of BufferAccess. This is largely the case when a transform interacts with only loads, or only stores.

      Current implementation (click to expand)
      PrimExpr VisitExpr_(const BufferLoadNode* load) final {
        if (load->buffer.same_as(info_->read_buffer)) {
          ObjectPtr<BufferLoadNode> n = make_object<BufferLoadNode>(*load);
          n->buffer = info_->write_buffer;
          return PrimExpr(n);
        }
        return ExprMutator::VisitExpr_(load);
      }
      Visiting `BufferAccess` (click to expand)
      // Need to make sure that we only rewrite access that occurs within
      // BufferLoad, which was implicit in the current implementation from
      // only visiting BufferLoad.
      
      Stmt VisitStmt_(const BufferStoreNode* op) final {
        ICHECK(!op->access->buffer.same_as(info_->read_buffer))
            << "Unexpected write to cache_read buffer " << op->access->buffer->name;
        return StmtExprMutator::VisitStmt_(op);
      }
      
      PrimExpr VisitExpr_(const BufferAccessNode* op) final {
        auto access = Downcast<BufferAccess>(StmtExprMutator::VisitExpr_(op));
        if (access->buffer.same_as(info_->read_buffer)) {
          auto writer = access.CopyOnWrite();
          writer->buffer = info_->write_buffer;
        }
        return std::move(access);
      }
      Rewrite `BufferAccess` while visiting `BufferLoad` (click to expand)
      PrimExpr VisitExpr_(const BufferLoadNode* op) final {
        auto load = Downcast<BufferLoad>(StmtExprMutator::VisitExpr_(op));
        
        if (load->access->buffer.same_as(info_->read_buffer)) {
          // Need to rewrite the `BufferAccess` before we can rewrite the `BufferLoad`.
          BufferAccess access = load->access;
          auto access_writer = access.CopyOnWrite();
          access_writer->buffer = info_->write_buffer;
      
          auto load_writer = load.CopyOnWrite();
          load_writer->access = access;
        }
        return std::move(load);
      }
  • The BufferAccess object would have the potential to replace the tvm_access_ptr, which is used to generate a pointer into a buffer, which can then be passed to hardware intrinsic functions.

    • Pros

      • Layout transformations that occur on graphs including a tvm_access_ptr do not need special handling.
      • Could replace the built-in call with a TIR node, which would be easier to access from other transformations.
    • Cons

      • TIR graphs that use tvm_access_ptr must conform to the layout that is used by the hardware intrinsic, and probably shouldn't have layout transformations applied to it.
      • Would introduce pointers as a first-class concept in TIR.

Overall, our conclusion was that since the ergonomics of BufferAccess can make some types of transforms worse to read, there isn't a strong argument to add it in. The specific commit that simplified transforms that could be simplified with BufferAccess identifies passes where a helper function could simplify the pass itself. We'd like to wait a few weeks to see additional arguments in favor of introducing BufferAccess come up, but otherwise would break the tie in favor of not changing the existing TIR.

@Lunderberg
Copy link
Contributor Author

Lunderberg commented Nov 8, 2021

A slight modification to the example of using a helper function to simplify implementation of the visitors for BufferStore and BufferLoad. Use of CopyOnWrite to pass a non-const reference to the helper function would cause unnecessary copies in the case where no change is necessary. The copy in CopyOnWrite occurs when CopyOnWrite() is called, not when the write itself happens. Therefore, the helper function should be rewritten in order to avoid this copy. This doesn't change the conclusion, but would change the recommended way to write the helper function.

BufferLoad/BufferStore, using templated helper function
Stmt VisitStmt_(const BufferStoreNode* op) final {
  return ModifyBuffer(Downcast<BufferStore>(StmtExprMutator::VisitStmt_(op)));
}

PrimExpr VisitStmt_(const BufferLoadNode* op) final {
  return ModifyBuffer(Downcast<BufferLoad>(StmtExprMutator::VisitExpr_(op)));
}

template<typename Node>
Node ModifyBuffer(Node node) {
  auto it = buffer_map_.find(node->buffer);
  if (it != buffer_map_.end()) {
    auto writer = node.CopyOnWrite();
    writer->buffer = it->second;
  }

  return node;
}

(I think the original helper function could still work if obj.CopyOnWrite()->field returned a proxy object that was convertible to the underlying type, and whose operator= performed the copy itself. However, that's probably more trouble than it's worth.)

@areusch
Copy link
Contributor

areusch commented Nov 10, 2021

cc @manupa-arm

@vinx13
Copy link
Member

vinx13 commented Dec 7, 2021

If there are no further comments, shall we conclude this?

@Lunderberg
Copy link
Contributor Author

Agreed, closing both this and the implementation PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status: need review RFC needs review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants