[TIR][CodeGen] Process buffer elem_offset in target codegen #10582

wrongtest-intellif · 2022-03-11T15:40:00Z

After [TE][TIR] Implement layout transformations, non-flat memory buffers #9727, the low level TIR memory access is on Buffer objects. The Buffer has elem_offset

tvm/include/tvm/tir/buffer.h

Lines 79 to 80 in e34985b

/*! \brief The offset in terms of number of dtype elements (including lanes) */

PrimExpr elem_offset;

Thus the addressing rule for BufferLoad, BufferStore and T.address_of should also take this offset field into consideration.

Also, the buffer combine functionality in StorageRewrite pass currently create alias buffers to the alloc buffer var, which denotes the start offset of the merged buffer. But it seems illness since the alias buffer's accessed indices exceed the alias buffer extent.

# example from ut 
@T.prim_func
def func(A: T.Buffer[(4,), "float32"], A4: T.Buffer[(8,), "float32"]) -> None:
    A0 = T.allocate([16], "float32", "global:tag")
    A0_1 = T.buffer_decl([8], dtype="float32", data=A0.data, scope="global:tag")
    A0_2 = T.buffer_decl([8], dtype="float32", data=A0.data, scope="global:tag")
    A0_3 = T.buffer_decl([8], dtype="float32", data=A0.data, scope="global:tag")
    A0_4 = T.buffer_decl([8], dtype="float32", data=A0.data, scope="global:tag")
    for i in T.serial(8):
        A0_1[i] = A[i] + A[0] + T.float32(1)
    for i in T.serial(8):
        A0_2[8 + i] = A0_1[i] + A0_1[0] + T.float32(2)
    for i in T.serial(8):
        A0_3[i] = A0_2[8 + i] + A0_2[8] + T.float32(3)
    for i in T.serial(8):
        A0_4[8 + i] = A0_3[i] + A0_3[0] + T.float32(4)
    for i in T.serial(8):
        A4[i] = A0_4[8 + i] + A0_4[8] + T.float32(5)

It would be great to set elem_offset to alias buffers, thus each alias buffer's address range is marked explicitly.

@T.prim_func
def func(A: T.Buffer[(4,), "float32"], A4: T.Buffer[(8,), "float32"]) -> None:
    A0 = T.allocate([16], "float32", "global:tag")
    A0_1 = T.buffer_decl([8], dtype="float32", data=A0.data, scope="global:tag")
    A0_2 = T.buffer_decl([8], dtype="float32", data=A0.data, elem_offset=8, scope="global:tag")
    A0_3 = T.buffer_decl([8], dtype="float32", data=A0.data, scope="global:tag")
    A0_4 = T.buffer_decl([8], dtype="float32", data=A0.data, elem_offset=8, scope="global:tag")
    for i in T.serial(8):
        A0_1[i] = A[i] + A[0] + T.float32(1)
    for i in T.serial(8):
        A0_2[i] = A0_1[i] + A0_1[0] + T.float32(2)
    for i in T.serial(8):
        A0_3[i] = A0_2[i] + A0_2[0] + T.float32(3)
    for i in T.serial(8):
        A0_4[i] = A0_3[i] + A0_3[0] + T.float32(4)
    for i in T.serial(8):
        A4[i] = A0_4[i] + A0_4[0] + T.float32(5)

junrushao · 2022-03-13T20:34:27Z

CC @Hzfengsy would you like to review this PR? Thanks a lot!

Hzfengsy · 2022-03-14T02:04:59Z

src/target/llvm/codegen_llvm.cc

  auto var_value = MakeValue(op->value);
  var_map_[op->var.get()] = var_value;
  var_value->setName(op->var->name_hint.c_str());
-  analyzer_->Bind(op->var, op->value);


Why don't we need bind here?

Bind means analyzer will always expand the value expression for simplify and other functionalities. It will break the evaluation order specified by lets, which should be respsected in codegen phase. The issue could be triggered on existing testcases by the simplify this PR adds.

I can use a local analyzer on this PR's purpose but I think this is still an issue to resolve.

Hzfengsy · 2022-03-14T02:06:25Z

src/target/llvm/codegen_llvm.h

  ExprDeepEqual deep_equal_;
  // binding of let variables. Enables duplicate var defs that map to same value
-  std::unordered_map<Var, const LetNode*, ObjectPtrHash, ObjectPtrEqual> let_binding_;
+  std::unordered_map<Var, PrimExpr, ObjectPtrHash, ObjectPtrEqual> let_binding_;


Could you please explain this change?

A local expression can be visited (eg, a simplify res) and some pointer are recorded. But when the local scope ends the backing object is expired.

wrongtest-intellif · 2022-03-14T09:24:13Z

cc @Lunderberg

tqchen · 2022-03-14T15:51:19Z

Thanks @wrongtest . I agree handling elem_offset is going to be useful.

One thing to note here is that we want to ensure that the alignment is handled properly. Specifically when elem_offset is non-zero, then low-level passes and analysis would need to take elem_offset into account when analyzing possible alignment properties of an access (this is something that we should note. Perhaps in the doc).

Something to take note as well in #10505, cc @vinx13

wrongtest-intellif · 2022-03-15T01:35:19Z

Thanks for the remind @tqchen !
For the data_alignment , I have an extra question that whether the data_alignment is about the data ptr address or about
the first element's address?

If the former, I think we should ensure the alignment property every time we create alias buffer to existing buffer vars, no matter whether the elem_offset is non-zero. I would like to change the alignment handling in StorageRewrite pass since it has explicit buffer aliasing impl code path.

tvm/include/tvm/tir/buffer.h

Lines 84 to 90 in 2f7bb58

    
             /*! \brief Alignment requirement of data pointer in bytes. */ 
        
             int data_alignment; 
        
             /*! 
        
              * \brief Factor of elem_offset field, 
        
              *  elem_offset is guaranteed to be multiple of offset_factor. 
        
              */ 
        
             int offset_factor;

tqchen · 2022-03-16T12:58:38Z

it is about data ptr's address

vinx13

buffer->elem_offset is already processed in FlattenBuffer / StorageFlatten, it is added to the buffer indices (https://github.com/apache/tvm/blob/main/src/tir/ir/buffer.cc#L307). Since subsequent passes can still declare buffer with elem_offset, I agree they still need to be handled. One option is to change GetFlattenBuffer(https://github.com/apache/tvm/blob/main/src/tir/ir/buffer.cc#L334) to have the output buffer elem_offset erased to ensure elem_offset are not being processed more than once.

cc @Lunderberg

wrongtest-intellif · 2022-03-17T01:41:00Z

Hi, @vinx13, many thanks for the notes:)

it is added to the buffer indices

I think why we may still need elem_offset is that passes after flatten may create aliased buffers, if we only add the indices, the alias buffer accesses may become not well-formed, assume a buffer A[8] is aliased to B + 8, if we add offset to the access index, it comes to

B = T.allocate([16], dtype="float32)
A = T.buffer_decl([8], dtype="float32", data=B.data)
for i in range(8):
    T.evaluate(A[i + 8])  # index out of bound form

Also note that the USMP use the form like

with T.let(A.data, T.address_of(B[8], dtype="handle")):
    for i in range(128):
        T.evaluate(A[i])

However, use let binding and addrees_of seems definitely increase the complexity if we want subsequent analyses on aliasing.

All three alternatives try to represent gep semantics since we can not directly add a buffer var of handle dtype. From my understanding, they all have some pro and cons:

add offset to access index
- pro: cleanest
- cons: the IR maybe in somewhat strange and ill form
bind a new buffer var
- pro: the buffer access semantic is correct
- cons: complexity for alias analyzing
use elem_offset field
- pro: the buffer access semantic is correct
- cons: handling the field is adhoc

Lunderberg · 2022-03-17T14:54:36Z

I'm overall in favor of keeping logic in the lowering stages, so that it wouldn't need to be repeated across multiple different codegens. I agree that the current state where a non-zero elem_offset can be silently ignored isn't a good state, but I don't think it is good to handle at the codegen level, because the semantics being handled aren't specific to any one codegen.

What if we allow codegens to assume that the elem_offset is zero, but add a lowering pass that validates this assumption? Having such a check would also be a good place to define what other assumptions codegens are allowed to make about the TIR that they receive (e.g. no Prefetch nodes, no builtin::tvm_thread_allreduce, no warp scope memory), along error messages specifying which lowering pass was expected to have lowered those constructs.

wrongtest-intellif · 2022-03-30T03:35:54Z

cc @junrushao1994 Hi, now I think this PR should be closed~, for
(1) we do not want elem_offset left to codegen part
(2) community are going on multi-dim elem offset refactor #10816

For new elem offset occured in lowering phase, I think some verifing pass as @Lunderberg suggested is a great idea, which could be introduced by other pr after refactoring done.

wrongtest-intellif mentioned this pull request Mar 11, 2022

[Tracking Issue] BufferAccess Migration Followup Items #10505

Closed

19 tasks

wrongtest-intellif force-pushed the process_buffer_elem_offset_in_codegen branch 3 times, most recently from be779cc to 515133e Compare March 13, 2022 12:36

Hzfengsy reviewed Mar 14, 2022

View reviewed changes

wrongtest-intellif force-pushed the process_buffer_elem_offset_in_codegen branch from 515133e to f53961d Compare March 14, 2022 05:04

Hzfengsy approved these changes Mar 14, 2022

View reviewed changes

wrongtest-intellif added 6 commits March 16, 2022 15:23

process buffer's elem_offset in target codegen

a1be7fe

do not bind let value into analyzer

6191e48

fix vta transform of buffer elem_offset

8915fbc

fix unsafe LetNode* dict value in let_binding_

468fed1

narrow down buffer elem offset

61c4d9c

fix usmp's elem offset typo

82d62dc

wrongtest-intellif force-pushed the process_buffer_elem_offset_in_codegen branch from f53961d to 82d62dc Compare March 16, 2022 07:24

vinx13 requested changes Mar 16, 2022

View reviewed changes

wrongtest-intellif closed this Apr 2, 2022

	/! \brief The offset in terms of number of dtype elements (including lanes) /
	PrimExpr elem_offset;

[TIR][CodeGen] Process buffer elem_offset in target codegen #10582

[TIR][CodeGen] Process buffer elem_offset in target codegen #10582

Uh oh!

Conversation

wrongtest-intellif commented Mar 11, 2022

Uh oh!

junrushao commented Mar 13, 2022

Uh oh!

Hzfengsy Mar 14, 2022

Choose a reason for hiding this comment

Uh oh!

wrongtest-intellif Mar 14, 2022

Choose a reason for hiding this comment

Uh oh!

Hzfengsy Mar 14, 2022

Choose a reason for hiding this comment

Uh oh!

wrongtest-intellif Mar 14, 2022

Choose a reason for hiding this comment

Uh oh!

wrongtest-intellif commented Mar 14, 2022

Uh oh!

tqchen commented Mar 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wrongtest-intellif commented Mar 15, 2022

Uh oh!

tqchen commented Mar 16, 2022

Uh oh!

vinx13 left a comment

Choose a reason for hiding this comment

Uh oh!

wrongtest-intellif commented Mar 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lunderberg commented Mar 17, 2022

Uh oh!

wrongtest-intellif commented Mar 30, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

tqchen commented Mar 14, 2022 •

edited

Loading

wrongtest-intellif commented Mar 17, 2022 •

edited

Loading