Skip to content

Graph Runtime for Heterogeneous Execution #1242

@tqchen

Description

@tqchen

Problem

The current graph runtime assumes that all the op are on a single context (OpenCL, or ARM), in reality we usually need switch between devices, especially for last few layers in detector models, where it might be hard to get a GPU version of the detector operators (@Laurawly is working on one, but that is a different story). A better solution would be enable the graph runtime and builder to split the graph into multiple devices, and insert copy operator when necessary.

Steps of Changes

    1. As a first step, let us support mixed host(CPU) and device code, which will solve most of the problems
    1. We will need API discussion for multiple devices, the current graph runtime relies on a single device context, which do not handle this need well. This is a longer term problem and can be resolved in second phase.

Proposed API Changes

  • Allow operators to register FComputeFallback, which indicates the operator need to fallback to host cpu implementation
  • Have a rewrite pass that insert copy node after seeing the nodes that fallsback to host.
  • Enable an additional column in the deploy graph indicate the device placement plan.

Related Issues

#1242

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions