-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[relay][pass] Annotation for heterogeneous compilation #2361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I just got back from vacation, will review this PR tonight, looks like great work 👍 |
|
@jroesch Thank you. Please take a look when you have time. |
|
@zhiics sorry I had done a partial review and forgot to hit submit, will finish rest and post comments right now. |
jroesch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks like a good first pass, most of my comments are small nits about the comments. It would be good to solicit review from someone who is familiar with heterogeneous-execution.
tmoreau89
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @zhiics for this excellent contribution. Quick question: how hard would it be to construct a heterogeneous test case (as test_pass_annotation.py) to run on the VTA simulator? It would be interesting to be able to provide explicit control over what components of the graphs get offloaded to CPU vs. VTA with this approach.
|
@tmoreau89 It shouldn't be hard if you only want to offload most of ops to VTA and only keep a few to CPU. Otherwise, I think it might be a little tedious to traverse the program and add |
07b5ce3 to
789d510
Compare
|
@zhiics Could you use git rebase so that it will include your commits? |
|
oops. Sorry. I will rebase now. |
789d510 to
074f95d
Compare
yzhliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haven't finished yet, will continue tonight. A high level question: for now users do need to write relay.on_device explicitly for annotation?
|
@yzhliu Yes, we need to use |
|
related RFC #2391 |
|
@jroesch Yes, I actually agree with you. We can add other passes for different annotation schemes, but users should have the flexibility to annotate expressions from the language directly. |
| std::vector<StorageToken*> tokens; | ||
| int device_id = node_device_map_.count(GetRef<Expr>(op)) | ||
| ? node_device_map_[GetRef<Expr>(op)]->value | ||
| : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we add 0 as a placeholder in DLDeviceType so that others will not use it for other special purpose by mistake.
also device_id -> device_type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yzhliu Yes, I also thought that it's probably necessary to have a field in DLDeviceType, like 'kDLUNDEFINED = 0'. Let's keep this for now. I will send a RFC later to hear from more people because it needs a slight change in dlpack.
add fallback cpptest fix lint accept both nn.op_name and op_name accept both nn.op_name and op_name use expr annotation instead of op names fix test_back_graph_runtime unit test
add fallback cpptest fix lint accept both nn.op_name and op_name accept both nn.op_name and op_name use expr annotation instead of op names fix test_back_graph_runtime unit test address @jroesch's comments
72121db to
7e6be33
Compare
|
@jroesch Please take another look. We can probably bring it in if everything looks good to you. Thanks. |
jroesch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed the delta, looks good to me 👍
This PR adds passes in Relay to annotate expressions to indicate which device/context each operator should be executed at. #2296 is the proposed RFC. The following changes are made in this PR.
Extend target to accept a dictionary of device/context to target and add the fallback_device argument in
buildofbuild_module.pybuild(func, target=None, target_host=None, params=None, fallback_device=None)Slightly Modify compile engine to stop lowering and generating schedules for device copy operators since the real data transferring will be only performed at runtime and no schedule is needed for this type of operators.
Modify memory plan to return both storage ids and device ids.
Add
on_device(expr, dev_id)anddevice_copy(expr, src_dev_id, dst_dev_id)operators as synthetic op as @jroesch suggested. The former takes anexprand adevice_idas inputs which indicate where an expression should be annotated. The latter will be used to perform data copy between different devices.Write several passes to validate the annotated program, rewrite the program (e.g. insert device copy operators), and propagate the device information from device copy operators to the other expression, etc.
Add unit tests to test the functionality of different annotation schemes.
@tqchen @jroesch @yidawang @yzhliu @tmoreau89