Skip to content

Conversation

@vvchernov
Copy link
Contributor

I observed that VirtualMachine::SetInputTensorWithIndex(...) method has discrepancy between description (also see description for VirtualMachine::SetInput(...) which assumes zero copy if possible and uses the method) and implementation. It always create new NDArray and copies data to it if source input is DLTensor even if devices are the same. It reduces performance of multiple input models due to excess copying. The PR fixes this issue.

Note: I have a remark about current design. VirtualMachine has only set_input python method, the same method is used inside run and invoke methods with input args. But there is no set_input_zero_copy. In description I obsrved that set_input tries to not use copying if possible. Theoretically we can have problem if set_input is used, input tensors are released after that and when run or invoke are launched. As I know GraphExecutor does not have such problem.

@vvchernov vvchernov force-pushed the vc/vm_set_input_zero_copy branch 2 times, most recently from a7f7a67 to c21e22b Compare April 13, 2022 19:22
@vvchernov vvchernov changed the title WIP: [VirtualMachine] Zero copy in set_input when input is DLTensor [VirtualMachine] Zero copy in set_input when input is DLTensor Apr 14, 2022
@vvchernov vvchernov force-pushed the vc/vm_set_input_zero_copy branch from 2086bba to ea05472 Compare April 14, 2022 13:59
Copy link
Contributor

@jwfromm jwfromm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change a lot and your comments are excellent.

Copy link
Contributor

@AndrewZhaoLuo AndrewZhaoLuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not sure about this change, it seems like a major change in invariants. I would rather you make a new method like "set_input_zero_copy" and expose that to the user to use.

Copy link
Contributor

@AndrewZhaoLuo AndrewZhaoLuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gonna block this until get another set of opinions on this. @mbs-octoml @altanh ?

@vvchernov
Copy link
Contributor Author

Hello @AndrewZhaoLuo! As you can see my note to the PR I said about the same, but currently set_input method of VirtualMachine already does zero copy for NDArray input. It means that if I add set_input_zero_copy I should change set_input to method which always copies external input to internal one (and performance for people who used it in their code is reduced). No problem, I can do it for me it is more reasonable approach. What do you think @jwfromm?

Copy link
Contributor

@mbs-octoml mbs-octoml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just a comment nit. Thanks, copy overhead has been troubling me lately so I'm glad you're ahead of me.

std::vector<int64_t> shape;
for (int64_t i = 0; i < tensor->ndim; i++) {
shape.push_back(tensor->shape[i]);
if (dev.device_type == tensor->device.device_type &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this is a great change.

Could you update vm.py's set_input doc string to clearly state the by-copy vs by-ref semantics? Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @mbs-octoml! I've added description and check device id for NDArray. But it looks like internal mechanism of copying does not take into account device id. Please see my changes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that. Yeah, the codebase is not at all 'device_id clean' and will require an audit to find all the places it is either ignored or defaulted to '0'. One step at a time.

@AndrewZhaoLuo
Copy link
Contributor

Hello @AndrewZhaoLuo! As you can see my note to the PR I said about the same, but currently set_input method of VirtualMachine already does zero copy for NDArray input. It means that if I add set_input_zero_copy I should change set_input to method which always copies external input to internal one (and performance for people who used it in their code is reduced). No problem, I can do it for me it is more reasonable approach. What do you think @jwfromm?

After talking to MBS, your change does in fact match the intended semantics better

@AndrewZhaoLuo
Copy link
Contributor

Just please cover the nit

@vvchernov vvchernov force-pushed the vc/vm_set_input_zero_copy branch from 0ad16f6 to c5606d6 Compare April 15, 2022 19:45
@AndrewZhaoLuo AndrewZhaoLuo merged commit fafabc9 into apache:main Apr 15, 2022
Lucien0 pushed a commit to Lucien0/tvm that referenced this pull request Apr 19, 2022
…e#11003)

* method of creating of NDArray from external DLTensor was implemented

* set input without copying for DLTensor source

* code clean up

* update description and comments after review

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
altanh pushed a commit to altanh/tvm that referenced this pull request Apr 28, 2022
…e#11003)

* method of creating of NDArray from external DLTensor was implemented

* set input without copying for DLTensor source

* code clean up

* update description and comments after review

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
@vvchernov vvchernov deleted the vc/vm_set_input_zero_copy branch May 28, 2022 11:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants