[Bug] [BYOC] AoT Codegen produces invalid packed function call for relay models using multi-output subgraphs

I recently ran into segmentation faults working with  rather large BYOC subgraphs and the AoT executor for MicroTVM. It seems to be related to a BYOC subgraph with more than one output, as I was able to create a relatively simple test case to reproduce the issue just using additions and the `ccompiler` codegen.

### Expected behavior

Simplified Test Case: (see below for full example)

```
    ...
    # Inputs and Weights
    x = relay.var("x", shape=(10, 10))
    w0 = relay.var("w0", shape=(10, 10))
    w1 = relay.var("w1", shape=(10, 10))
    w2 = relay.var("w2", shape=(10, 10))

    # C compiler

    # z0 = x + w0
    x_ = compiler_begin(x, "ccompiler")
    w0_ = compiler_begin(w0, "ccompiler")
    z0_ = relay.add(x_, w0_)
    z0 = compiler_end(z0_, "ccompiler")

    # z1 = z0 + w1
    z0__ = compiler_begin(z0, "ccompiler")
    w1_ = compiler_begin(w1, "ccompiler")
    z1_ = relay.add(z0__, w1_)
    z1 = compiler_end(z1_, "ccompiler")

    # TVM Compiler

    # z2 = z0 + z1
    z2 = relay.add(z0, z1)

    f = relay.Function([x, w0, w1], z2)
    mod = tvm.IRModule()
    mod["main"] = f

    if merge_compiler_regions:
        mod = transform.MergeCompilerRegions()(mod)

    mod = transform.PartitionGraph("mod_name")(mod)
    mod = transform.InferType()(mod)
    ...
```

Running the test should not result in any failures.

### Actual behavior

The test for `merge_compiler_regions=True` failed while one with only one `relay.add` per subgraph finished successful.

### Investigation

The problem seems to be that the the `tvmgen_my_mod_run_model` generated by the AoT codegen the TVM function `tvmgen_my_mod_fused_add` is assumed to have 2 arguments while it actually has 3 (e.g. 2 inputs and 1 output). Therefore the last argument is not properly packed and will still point to the 3rd argument of the previous packed function call instead of the model output.


Using the default `aot_test_utils.py`, it will "just" fail because of an output value mismatch because all model inputs are declared as non-constant. In https://github.com/PhilippvK/tvm/commit/2bb77f8324023b4646f52365f38950d36503b8f1 I modified the `create_header_file` function to store model inputs as constants which leads to the mentioned segmentation fails caused by writing to a `const` variable.

As the error is only present for `merge_compiler_regions=True`, it should not be directly related to Tuple inputs.

Relay model after partitioning:

```
def @main(%x: Tensor[(10, 10), float32], %w0: Tensor[(10, 10), float32], %w1: Tensor[(10, 10), float32]) -> Tensor[(10, 10), float32] {
  %0 = @tvmgen_mod_name_ccompiler_main_0(%x, %w0, %w1) /* ty=(Tensor[(10, 10), float32], Tensor[(10, 10), float32]) */;
  %1 = %0.0;
  %2 = %0.1;
  add(%1, %2) /* ty=Tensor[(10, 10), float32] */
}

def @tvmgen_mod_name_ccompiler_main_0(%ccompiler_0_i0: Tensor[(10, 10), float32], %ccompiler_0_i1: Tensor[(10, 10), float32], %ccompiler_0_i2: Tensor[(10, 10), float32], Inline=1, Compiler="ccompiler", global_symbol="tvmgen_mod_name_ccompiler_main_0", Primitive=1) -> (Tensor[(10, 10), float32], Tensor[(10, 10), float32]) {
  %3 = add(%ccompiler_0_i0, %ccompiler_0_i1) /* ty=Tensor[(10, 10), float32] */;
  %4 = add(%3, %ccompiler_0_i2) /* ty=Tensor[(10, 10), float32] */;
  (%3, %4)
}
```

This is the incorrectly generated code snippet by the AoT:

```
  TVMValue stack4[6];
  void* tvm_value_2 = stack4;
  (((DLTensor*)tvm_value_2)[0].data) = sid_3;
  (((TVMValue*)stack_value)[0].v_handle) = tvm_value_2;
  ((int32_t*)stack_tcode)[(0)] = 3;
  (((TVMValue*)stack_value)[1].v_handle) = output;
  ((int32_t*)stack_tcode)[(1)] = 3;
  TVMValue ret_val1;
  int ret_type_code1;
  if (tvmgen_my_mod_fused_add( (TVMValue*) stack_value , (int*) stack_tcode, 2, &ret_val1, &ret_type_code1, NULL) != 0){
    return -1;
  }
  return 0;
```

### Environment

Operating System: Ubuntu18.04 & 20.04

TVM Version: 1fd8f610953adc39cbd18d82f4a9e92a11575dfc (latest)

Python Version: Python v3.6

### Steps to reproduce


The [full pytest script](https://github.com/PhilippvK/tvm/blob/d20ddbd4a6b3a8f4aa5fc1bf467dc4f05c15bb1c/tests/python/relay/aot/test_crt_aot_bug.py) for reproducing the issue can be found [here ](https://github.com/PhilippvK/tvm/tree/reproduce-byoc-aot-bug) alongside with the mentioned [modifications ](https://github.com/PhilippvK/tvm/commit/2bb77f8324023b4646f52365f38950d36503b8f1) to the AOT test helper script.

The tests can be run using the following command:

```
export PYTHONPATH=$(pwd)/python
python3 -m pytest tests/python/relay/aot/test_crt_aot_bug.py -s
```

(The `-s` is useful to inspect the relay model before and after partitioning which is printed during the test.)

Make sure to compile TVM using MicroTVM and LLVM support!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] [BYOC] AoT Codegen produces invalid packed function call for relay models using multi-output subgraphs #9036

Expected behavior

Actual behavior

Investigation

Environment

Steps to reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] [BYOC] AoT Codegen produces invalid packed function call for relay models using multi-output subgraphs #9036

Description

Expected behavior

Actual behavior

Investigation

Environment

Steps to reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions