Split from #341 (comment), where @wacky6 mentioned
Maybe we should subclass MLGraph based on the context that creates it. For example, CPU context returns a MLCpuGraph with compute(). GPU context returns a MLGpuGraph with compute() and GPU interop methods (commandBuffer, dispatch, etc).
If we fold the command recording methods into MLGpuGraph, it may not support recording multiple MLGraphs into one command buffer that MLCommandEncoder supports. Pipelining models execution may reduce the GPU queue submission overhead and improve the throughput.
/cc @wchao1115