-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Closed
Labels
Description
Problem
The current graph runtime assumes that all the op are on a single context (OpenCL, or ARM), in reality we usually need switch between devices, especially for last few layers in detector models, where it might be hard to get a GPU version of the detector operators (@Laurawly is working on one, but that is a different story). A better solution would be enable the graph runtime and builder to split the graph into multiple devices, and insert copy operator when necessary.
Steps of Changes
-
- As a first step, let us support mixed host(CPU) and device code, which will solve most of the problems
-
- We will need API discussion for multiple devices, the current graph runtime relies on a single device context, which do not handle this need well. This is a longer term problem and can be resolved in second phase.
Proposed API Changes
- Allow operators to register FComputeFallback, which indicates the operator need to fallback to host cpu implementation
- Have a rewrite pass that insert copy node after seeing the nodes that fallsback to host.
- Enable an additional column in the deploy graph indicate the device placement plan.