-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[TIR][CodeGen] Process buffer elem_offset in target codegen #10582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TIR][CodeGen] Process buffer elem_offset in target codegen #10582
Conversation
be779cc to
515133e
Compare
|
CC @Hzfengsy would you like to review this PR? Thanks a lot! |
| auto var_value = MakeValue(op->value); | ||
| var_map_[op->var.get()] = var_value; | ||
| var_value->setName(op->var->name_hint.c_str()); | ||
| analyzer_->Bind(op->var, op->value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we need bind here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bind means analyzer will always expand the value expression for simplify and other functionalities. It will break the evaluation order specified by lets, which should be respsected in codegen phase. The issue could be triggered on existing testcases by the simplify this PR adds.
I can use a local analyzer on this PR's purpose but I think this is still an issue to resolve.
| ExprDeepEqual deep_equal_; | ||
| // binding of let variables. Enables duplicate var defs that map to same value | ||
| std::unordered_map<Var, const LetNode*, ObjectPtrHash, ObjectPtrEqual> let_binding_; | ||
| std::unordered_map<Var, PrimExpr, ObjectPtrHash, ObjectPtrEqual> let_binding_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please explain this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A local expression can be visited (eg, a simplify res) and some pointer are recorded. But when the local scope ends the backing object is expired.
515133e to
f53961d
Compare
|
cc @Lunderberg |
|
Thanks @wrongtest . I agree handling elem_offset is going to be useful. One thing to note here is that we want to ensure that the alignment is handled properly. Specifically when elem_offset is non-zero, then low-level passes and analysis would need to take elem_offset into account when analyzing possible alignment properties of an access (this is something that we should note. Perhaps in the doc). |
|
Thanks for the remind @tqchen ! If the former, I think we should ensure the alignment property every time we create alias buffer to existing buffer vars, no matter whether the Lines 84 to 90 in 2f7bb58
|
f53961d to
82d62dc
Compare
|
it is about |
vinx13
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buffer->elem_offset is already processed in FlattenBuffer / StorageFlatten, it is added to the buffer indices (https://github.com/apache/tvm/blob/main/src/tir/ir/buffer.cc#L307). Since subsequent passes can still declare buffer with elem_offset, I agree they still need to be handled. One option is to change GetFlattenBuffer(https://github.com/apache/tvm/blob/main/src/tir/ir/buffer.cc#L334) to have the output buffer elem_offset erased to ensure elem_offset are not being processed more than once.
cc @Lunderberg
|
Hi, @vinx13, many thanks for the notes:)
I think why we may still need B = T.allocate([16], dtype="float32)
A = T.buffer_decl([8], dtype="float32", data=B.data)
for i in range(8):
T.evaluate(A[i + 8]) # index out of bound formAlso note that the USMP use the form like with T.let(A.data, T.address_of(B[8], dtype="handle")):
for i in range(128):
T.evaluate(A[i])However, use let binding and All three alternatives try to represent gep semantics since we can not directly add a buffer var of handle dtype. From my understanding, they all have some pro and cons:
|
|
I'm overall in favor of keeping logic in the lowering stages, so that it wouldn't need to be repeated across multiple different codegens. I agree that the current state where a non-zero What if we allow codegens to assume that the |
|
cc @junrushao1994 Hi, now I think this PR should be closed~, for For new elem offset occured in lowering phase, I think some verifing pass as @Lunderberg suggested is a great idea, which could be introduced by other pr after refactoring done. |
After [TE][TIR] Implement layout transformations, non-flat memory buffers #9727, the low level TIR memory access is on
Bufferobjects. TheBufferhaselem_offsettvm/include/tvm/tir/buffer.h
Lines 79 to 80 in e34985b
Thus the addressing rule for
BufferLoad,BufferStoreandT.address_ofshould also take this offset field into consideration.Also, the buffer combine functionality in
StorageRewritepass currently create alias buffers to the alloc buffer var, which denotes the start offset of the merged buffer. But it seems illness since the alias buffer's accessed indices exceed the alias buffer extent.It would be great to set
elem_offsetto alias buffers, thus each alias buffer's address range is marked explicitly.