[WIP][Runtime] Pipeline Executor Add Set and Get Input/Output interfaces. #9494

huajsj · 2021-11-11T06:00:25Z

RFC PR: apache/tvm-rfcs#0014
GitHub Issue: apache/tvm#8596

Add "param" connection into pipeline config to support such case
like set params into module 1 by a param name "param0"
Add using input name to locate backend runtime index and input index
implemention.
Add interface like run/stop/set_input/get_output etc.
Add a implemention of serialized pipeline backend runtime execution
for the purpose to test all the said interface.

Thanks for contributing to TVM! Please refer to guideline https://tvm.apache.org/docs/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.

1. Add "param" connection into pipeline config to support such case like set params into module 1 by a param name "param0" 2. Add using input name to locate backend runtime index and input index implemention. 3. Add interface like run/stop/set_input/get_output etc. 4. Add a implemention of serialized pipeline backend runtime execution for the purpose to test all the said interface.

huajsj · 2021-11-11T18:46:32Z

@comaniac @masahi please take a look.

comaniac

Please add the RFC and tracking issue information to the PR description. Also please update the tracking issue.

python/tvm/contrib/pipeline_executor.py

src/runtime/pipeline/pipeline_struct.h

comaniac · 2021-11-12T17:43:47Z

python/tvm/contrib/pipeline_executor.py

+        v = self._get_input(key)
+        if v is None:
+            raise RuntimeError("Could not find '%s' in pipeline's inputs" % key)
+        v.copyfrom(value)


I just realized that you need to do this because all buffers were created when initializing the pipeline executor. In this way we will double the memory usage of global inputs. Can we avoid allocating redundant buffers to reduce memory consumption, similar to graph executor and VM?

sure, the said logic seems like already follow the same logic as what graph_executor did and no redundant buffer creation, could you help to give more detail information about which part cause the problem?

python/tvm/contrib/pipeline_executor.py

src/runtime/pipeline/pipeline_struct.h

comaniac · 2021-11-12T17:59:08Z

src/runtime/pipeline/pipeline_struct.h

+    // If the source device and target device is not same, we use a local DLTensor
+    // as a medium to do the cross device copy work.


Suggested change

// If the source device and target device is not same, we use a local DLTensor

// as a medium to do the cross device copy work.

// If the source device and target device are not the same, we use a temporary DLTensor on CPU

// as the bridge.

After thinking twice, I feel this may still introduce unnecessary overhead. Isn't it possible for non-CPU device APIs to support from? Should we first try TVMArrayCopyFromTo(from, to, nullptr), and then go through CPU if failed?

Current existing TVMArrayCopyFromTo logic supported from device, once two device both are non-cpu, the said copy function will calling from DeviceAPI function to copy data from from to to, in such case reading data from from device is not problem, but write data to to device may cause crash issue. the reason is the from device may have no capability to access the data of DLTensor of to, for example when to device is VTA, the memory address in DLTensor actually is the address of data structure VTABuffer pointer instead of data memory address, a try of TVMArrayCopyFromTo to directly copy data into the to DLTensor will cause a crash by memory access violation.

if we revise the logic to let TVMArrayCopyFromTo to support to device, then read data from from device would become a new problem, hence here we use a bridge NDArray as a solution.

huajsj · 2021-11-16T19:02:49Z

thanks @comaniac , all review comments addressed, please take a look.

huajsj · 2021-11-29T21:05:44Z

@comaniac @masahi

masahi · 2021-12-01T13:04:29Z

I can take a look this week end / next week.

huajsj · 2021-12-01T18:32:07Z

thanks @masahi.

masahi

I started reviewing, but on the first file I saw many basic grammar errors. I have to say you didn't learn from previous PRs. I won't review 1K code like this unless you make some effort to make reviewing easier.

masahi · 2021-12-03T03:50:06Z

python/tvm/contrib/pipeline_executor.py

+        self._stop()
+
+    def set_input(self, key, value):
+        """Set inputs to the module via "value".


via "value" doesn't make sense

masahi · 2021-12-03T03:50:27Z

python/tvm/contrib/pipeline_executor.py

+        v.copyfrom(value)
+
+    def set_params(self, params_name, params_data):
+        """Set params to the module via param name and params data.


Choose param or params

masahi · 2021-12-03T03:53:13Z

python/tvm/contrib/pipeline_executor.py

+            self._set_param(params_name, key, val)
+
+    def get_input(self, key):
+        """Get the input via a input name.


masahi · 2021-12-03T03:53:29Z

python/tvm/contrib/pipeline_executor.py

+        Returns
+        -------
+        data : NDArray
+            Then input data.


masahi · 2021-12-03T03:54:23Z

python/tvm/contrib/pipeline_executor.py

+            The params name
+
+        params_data : dict of str to NDArray
+            A list of params data and params key name.


what is the difference between params key and params_name

masahi · 2021-12-03T03:55:20Z

python/tvm/contrib/pipeline_executor.py

+            A list of params data and params key name.
+        """
+        for key, val in params_data.items():
+            self._set_param(params_name, key, val)


Related to the above comment, this is a weird API.

huajsj · 2021-12-03T06:05:45Z

Thanks @masahi for the review, besides of fixing grammar issue, do you think that splitting this PR into couple small PR can make it be more review friendly? I will go to split this PR if you think that can help for the review.

masahi · 2021-12-03T06:21:21Z

Thanks @masahi for the review, besides of fixing grammar issue, do you think that splitting this PR into couple small PR can make it be more review friendly? I will go to split this PR if you think that can help for the review.

Yes! That sounds great, thanks.

huajsj · 2021-12-03T18:24:31Z

thanks @masahi @comaniac for the review, and sorry for inconvenience caused by the big PR code size, I start to split this PR into 4 small PR to make it more review friendly, following is the plan of patch splitting, associated tracking issue also get updated. the current PR will get keep for a while for review comments tracking purpose.

Pipeline executor set_input implementation and input configuration loading.
Pipeline executor set_params implementation and params configuration loading.
Pipeline executor runtime interface declare and implementation.
Pipeline executor pipe line sequence execution.

huajsj requested review from ZihengJiang, areusch, comaniac, icemelon, jroesch, junrushao, kazum, liangfu, masahi, merrymercy, tmoreau89, tqchen, vinx13, yzhliu and zhiics as code owners November 11, 2021 06:00

comaniac requested changes Nov 11, 2021

View reviewed changes

huajsj mentioned this pull request Nov 11, 2021

[RFC][Tracking Issue] Pipeline Executor For Compute graph pipeline #8596

Closed

15 tasks

comaniac reviewed Nov 12, 2021

View reviewed changes

src/runtime/pipeline/pipeline_struct.h Outdated Show resolved Hide resolved

src/runtime/pipeline/pipeline_struct.h Show resolved Hide resolved

huajsj added 2 commits November 11, 2021 21:44

address review comments.

460c316

Remove set_input and set_param array CPU target binding.

5bfe5af

comaniac reviewed Nov 12, 2021

View reviewed changes

address review comments.

585060c

huajsj requested a review from comaniac November 13, 2021 01:01

masahi requested changes Dec 3, 2021

View reviewed changes

adress review comments.

37288fe

huajsj changed the title ~~[Runtime] Pipeline Executor Add Set and Get Input/Output interfaces.~~ [WIP][Runtime] Pipeline Executor Add Set and Get Input/Output interfaces. Dec 3, 2021

huajsj closed this Jan 29, 2022

		// If the source device and target device is not same, we use a local DLTensor
		// as a medium to do the cross device copy work.

[WIP][Runtime] Pipeline Executor Add Set and Get Input/Output interfaces. #9494

[WIP][Runtime] Pipeline Executor Add Set and Get Input/Output interfaces. #9494

Uh oh!

Conversation

huajsj commented Nov 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

huajsj commented Nov 11, 2021

Uh oh!

comaniac left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huajsj commented Nov 16, 2021

Uh oh!

huajsj commented Nov 29, 2021

Uh oh!

masahi commented Dec 1, 2021

Uh oh!

huajsj commented Dec 1, 2021

Uh oh!

masahi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huajsj commented Dec 3, 2021

Uh oh!

masahi commented Dec 3, 2021

Uh oh!

huajsj commented Dec 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

huajsj commented Nov 11, 2021 •

edited

Loading

huajsj commented Dec 3, 2021 •

edited

Loading