-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[VirtualMachine] Zero copy in set_input when input is DLTensor #11003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a7f7a67 to
c21e22b
Compare
2086bba to
ea05472
Compare
jwfromm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this change a lot and your comments are excellent.
AndrewZhaoLuo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm not sure about this change, it seems like a major change in invariants. I would rather you make a new method like "set_input_zero_copy" and expose that to the user to use.
AndrewZhaoLuo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gonna block this until get another set of opinions on this. @mbs-octoml @altanh ?
|
Hello @AndrewZhaoLuo! As you can see my note to the PR I said about the same, but currently |
mbs-octoml
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just a comment nit. Thanks, copy overhead has been troubling me lately so I'm glad you're ahead of me.
| std::vector<int64_t> shape; | ||
| for (int64_t i = 0; i < tensor->ndim; i++) { | ||
| shape.push_back(tensor->shape[i]); | ||
| if (dev.device_type == tensor->device.device_type && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this is a great change.
Could you update vm.py's set_input doc string to clearly state the by-copy vs by-ref semantics? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @mbs-octoml! I've added description and check device id for NDArray. But it looks like internal mechanism of copying does not take into account device id. Please see my changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for that. Yeah, the codebase is not at all 'device_id clean' and will require an audit to find all the places it is either ignored or defaulted to '0'. One step at a time.
After talking to MBS, your change does in fact match the intended semantics better |
|
Just please cover the nit |
0ad16f6 to
c5606d6
Compare
…e#11003) * method of creating of NDArray from external DLTensor was implemented * set input without copying for DLTensor source * code clean up * update description and comments after review Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
…e#11003) * method of creating of NDArray from external DLTensor was implemented * set input without copying for DLTensor source * code clean up * update description and comments after review Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
I observed that VirtualMachine::SetInputTensorWithIndex(...) method has discrepancy between description (also see description for VirtualMachine::SetInput(...) which assumes zero copy if possible and uses the method) and implementation. It always create new NDArray and copies data to it if source input is DLTensor even if devices are the same. It reduces performance of multiple input models due to excess copying. The PR fixes this issue.
Note: I have a remark about current design. VirtualMachine has only
set_inputpython method, the same method is used insiderunandinvokemethods with input args. But there is noset_input_zero_copy. In description I obsrved thatset_inputtries to not use copying if possible. Theoretically we can have problem ifset_inputis used, input tensors are released after that and whenrunorinvokeare launched. As I know GraphExecutor does not have such problem.