Skip to content

Conversation

@junrushao
Copy link
Member

This reverts commit aa4587f.

Unfortunately, this PR broke MLC LLM's build pipeline, more specifically, the command below

python3 -m mlc_llm.build --model dist/models/llama-2-13b-chat-hf/ --quantization q4f16_1

leads to the the following error message:

Details
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/opt/scratch/junrushao/mlc-llm/mlc_llm/build.py", line 13, in <module>
    main()
  File "/opt/scratch/junrushao/mlc-llm/mlc_llm/build.py", line 10, in main
    core.build_model_from_args(parsed_args)
  File "/opt/scratch/junrushao/mlc-llm/mlc_llm/core.py", line 616, in build_model_from_args
    new_params = utils.convert_weights(param_manager, params, args)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/scratch/junrushao/mlc-llm/mlc_llm/utils.py", line 258, in convert_weights
    vm["transform_params"]()
  File "/opt/scratch/junrushao/tvm-dev/python/tvm/_ffi/_ctypes/packed_func.py", line 239, in __call__
    raise_last_ffi_error()
  File "/opt/scratch/junrushao/tvm-dev/python/tvm/_ffi/base.py", line 476, in raise_last_ffi_error
    raise py_err
  File "/opt/scratch/junrushao/tvm-dev/src/runtime/relax_vm/vm.cc", line 634, in tvm::runtime::relax_vm::VirtualMachineImpl::InvokeClosurePacked(tvm::runtime::ObjectRef const&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
    clo->impl.CallPacked(TVMArgs(values.data(), tcodes.data(), args.size() + 1), rv);
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/scratch/junrushao/tvm-dev/src/runtime/relax_vm/vm.cc", line 708, in operator()
    *rv = static_cast<VirtualMachineImpl*>(ctx_ptr)->InvokeBytecode(gf_idx, inputs);
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/scratch/junrushao/tvm-dev/src/runtime/relax_vm/vm.cc", line 765, in tvm::runtime::relax_vm::VirtualMachineImpl::InvokeBytecode(long, std::vector<tvm::runtime::TVMRetValue, std::allocator<tvm::runtime::TVMRetValue> > const&)
    RunLoop();

  File "/opt/scratch/junrushao/tvm-dev/src/runtime/relax_vm/vm.cc", line 890, in tvm::runtime::relax_vm::VirtualMachineImpl::RunLoop()
    this->RunInstrCall(curr_frame, instr);
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/scratch/junrushao/tvm-dev/src/runtime/relax_vm/vm.cc", line 843, in tvm::runtime::relax_vm::VirtualMachineImpl::RunInstrCall(tvm::runtime::relax_vm::VMFrame*, tvm::runtime::relax_vm::Instruction)
    this->InvokeClosurePacked(func_pool_[instr.func_idx], args, &ret);
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/scratch/junrushao/mlc-llm/mlc_llm/relax_model/param_manager.py", line 579, in set_item
    loaded_params[i] = tvm.nd.array(computed_param, device=device_cpu)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/scratch/junrushao/tvm-dev/python/tvm/runtime/ndarray.py", line 635, in array
    return empty(arr.shape, arr.dtype, device, mem_scope).copyfrom(arr)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/scratch/junrushao/tvm-dev/python/tvm/runtime/ndarray.py", line 390, in empty
    dtype = DataType(dtype)
            ^^^^^^^^^^^^^^^
  File "/opt/scratch/junrushao/tvm-dev/python/tvm/_ffi/runtime_ctypes.py", line 174, in __init__
    raise ValueError("Do not know how to handle type %s" % type_str)
ValueError: Do not know how to handle type object

To briefly explain the root cause of this issue, set_item method, as defined here, gets computed_param=None after this commit.

As a temporary solution, I'd love to propose that we revert this commit for now to quickly unblock us from building any LLM model, but I'm happy to get it back immediately as soon as the issue is fixed.

)"

This reverts commit aa4587f.

Unfortunately, this PR broke MLC LLM's build pipeline, more specifically, the command below

```python
python3 -m mlc_llm.build --model dist/models/llama-2-13b-chat-hf/ --quantization q4f16_1
```

leads to the the following error message:

<details>

```
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/opt/scratch/junrushao/mlc-llm/mlc_llm/build.py", line 13, in <module>
    main()
  File "/opt/scratch/junrushao/mlc-llm/mlc_llm/build.py", line 10, in main
    core.build_model_from_args(parsed_args)
  File "/opt/scratch/junrushao/mlc-llm/mlc_llm/core.py", line 616, in build_model_from_args
    new_params = utils.convert_weights(param_manager, params, args)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/scratch/junrushao/mlc-llm/mlc_llm/utils.py", line 258, in convert_weights
    vm["transform_params"]()
  File "/opt/scratch/junrushao/tvm-dev/python/tvm/_ffi/_ctypes/packed_func.py", line 239, in __call__
    raise_last_ffi_error()
  File "/opt/scratch/junrushao/tvm-dev/python/tvm/_ffi/base.py", line 476, in raise_last_ffi_error
    raise py_err
  File "/opt/scratch/junrushao/tvm-dev/src/runtime/relax_vm/vm.cc", line 634, in tvm::runtime::relax_vm::VirtualMachineImpl::InvokeClosurePacked(tvm::runtime::ObjectRef const&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
    clo->impl.CallPacked(TVMArgs(values.data(), tcodes.data(), args.size() + 1), rv);
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/scratch/junrushao/tvm-dev/src/runtime/relax_vm/vm.cc", line 708, in operator()
    *rv = static_cast<VirtualMachineImpl*>(ctx_ptr)->InvokeBytecode(gf_idx, inputs);
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/scratch/junrushao/tvm-dev/src/runtime/relax_vm/vm.cc", line 765, in tvm::runtime::relax_vm::VirtualMachineImpl::InvokeBytecode(long, std::vector<tvm::runtime::TVMRetValue, std::allocator<tvm::runtime::TVMRetValue> > const&)
    RunLoop();

  File "/opt/scratch/junrushao/tvm-dev/src/runtime/relax_vm/vm.cc", line 890, in tvm::runtime::relax_vm::VirtualMachineImpl::RunLoop()
    this->RunInstrCall(curr_frame, instr);
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/scratch/junrushao/tvm-dev/src/runtime/relax_vm/vm.cc", line 843, in tvm::runtime::relax_vm::VirtualMachineImpl::RunInstrCall(tvm::runtime::relax_vm::VMFrame*, tvm::runtime::relax_vm::Instruction)
    this->InvokeClosurePacked(func_pool_[instr.func_idx], args, &ret);
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/scratch/junrushao/mlc-llm/mlc_llm/relax_model/param_manager.py", line 579, in set_item
    loaded_params[i] = tvm.nd.array(computed_param, device=device_cpu)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/scratch/junrushao/tvm-dev/python/tvm/runtime/ndarray.py", line 635, in array
    return empty(arr.shape, arr.dtype, device, mem_scope).copyfrom(arr)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/scratch/junrushao/tvm-dev/python/tvm/runtime/ndarray.py", line 390, in empty
    dtype = DataType(dtype)
            ^^^^^^^^^^^^^^^
  File "/opt/scratch/junrushao/tvm-dev/python/tvm/_ffi/runtime_ctypes.py", line 174, in __init__
    raise ValueError("Do not know how to handle type %s" % type_str)
ValueError: Do not know how to handle type object
```

</details>

To briefly explain the root cause of this issue, `set_item` method, as defined [here](https://github.com/mlc-ai/mlc-llm/blob/4f4a93f03fed3d900605c02de575d7d5f429ed79/mlc_llm/relax_model/param_manager.py#L576-L579), gets `computed_param=None` after this commit.

As a temporary solution, I'd love to propose that we revert this commit for now to quickly unblock us from building any LLM model, but I'm happy to get it back immediately as soon as the issue is fixed.
@junrushao junrushao marked this pull request as ready for review October 1, 2023 22:00
@junrushao
Copy link
Member Author

CC: @csullivan @Lunderberg

@Lunderberg
Copy link
Contributor

Thank you, and I've identified why this happens. The CodeGenVM has a special case to handle trivial re-bindings of a variable, such that they share the same register. While KillAfterLastUse checks for this case in VarBinding nodes, it does not also check for it in MatchCast nodes.

Working on two fixes now, one for the short-term and one for the long-term. The short-term fix will add similar handling to KillAfterLastUse to check for trivial rebinding through a MatchCast node. The long-term fix will remove the special-case for trivial rebindings in the CodeGenVM, and instead instead apply CanonicalizeBindings as part of the lowering flow to remove all such cases. This way, we get the same codegened module, but the upstream passes do not need to know exactly what CodeGenVM does internally.

@Lunderberg
Copy link
Contributor

The short-term PR is submitted at #15854, which adds another special-case in KillAfterLastUse to work around the special-case in CodeGenVM.

@junrushao
Copy link
Member Author

Thanks for quickly acting on this! Let’s get #15854 merged instead!

@junrushao junrushao closed this Oct 2, 2023
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request Oct 2, 2023
Prior to this commit, if a relax variable were assigned to itself,
through either `VarBinding` or `MatchCast` nodes, the two relax
variables would share the same register in the VM.  As a result, any
upstream transform that deletes an object with `R.memory.kill_tensor`
or `R.memory.kill_storage` must be aware of this VM behavior, and to
only output one such instruction for each set of aliased registers.

This commit updates the VM to produce one register for each aliased
relax variable.  The trivial bindings can be removed applying
`relax.transform.CanonicalizeBindings`, instead of being implicitly
de-duplicated at the codegen level.

This PR is a follow-up to apache#15854,
with a better long-term solution, but which may have knock-on effects
that must also be resolved.  In addition, this adds a usage example
for the bug reported in apache#15852, to
avoid re-occurrence of similar issues.
@Lunderberg
Copy link
Contributor

Sounds good! I now also have #15855 prepared, which applied the more general fix and can be merged in at a more leisurely pace.

Lunderberg added a commit to Lunderberg/tvm that referenced this pull request Oct 3, 2023
Prior to this commit, if a relax variable were assigned to itself,
through either `VarBinding` or `MatchCast` nodes, the two relax
variables would share the same register in the VM.  As a result, any
upstream transform that deletes an object with `R.memory.kill_tensor`
or `R.memory.kill_storage` must be aware of this VM behavior, and to
only output one such instruction for each set of aliased registers.

This commit updates the VM to produce one register for each aliased
relax variable.  The trivial bindings can be removed applying
`relax.transform.CanonicalizeBindings`, instead of being implicitly
de-duplicated at the codegen level.

This PR is a follow-up to apache#15854,
with a better long-term solution, but which may have knock-on effects
that must also be resolved.  In addition, this adds a usage example
for the bug reported in apache#15852, to
avoid re-occurrence of similar issues.
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request Oct 24, 2023
Prior to this commit, if a relax variable were assigned to itself,
through either `VarBinding` or `MatchCast` nodes, the two relax
variables would share the same register in the VM.  As a result, any
upstream transform that deletes an object with `R.memory.kill_tensor`
or `R.memory.kill_storage` must be aware of this VM behavior, and to
only output one such instruction for each set of aliased registers.

This commit updates the VM to produce one register for each aliased
relax variable.  The trivial bindings can be removed applying
`relax.transform.CanonicalizeBindings`, instead of being implicitly
de-duplicated at the codegen level.

This PR is a follow-up to apache#15854,
with a better long-term solution, but which may have knock-on effects
that must also be resolved.  In addition, this adds a usage example
for the bug reported in apache#15852, to
avoid re-occurrence of similar issues.
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request Oct 25, 2023
Prior to this commit, if a relax variable were assigned to itself,
through either `VarBinding` or `MatchCast` nodes, the two relax
variables would share the same register in the VM.  As a result, any
upstream transform that deletes an object with `R.memory.kill_tensor`
or `R.memory.kill_storage` must be aware of this VM behavior, and to
only output one such instruction for each set of aliased registers.

This commit updates the VM to produce one register for each aliased
relax variable.  The trivial bindings can be removed applying
`relax.transform.CanonicalizeBindings`, instead of being implicitly
de-duplicated at the codegen level.

This PR is a follow-up to apache#15854,
with a better long-term solution, but which may have knock-on effects
that must also be resolved.  In addition, this adds a usage example
for the bug reported in apache#15852, to
avoid re-occurrence of similar issues.
Lunderberg added a commit that referenced this pull request Oct 31, 2023
* [Unity] Ensure one VM register for each relax binding

Prior to this commit, if a relax variable were assigned to itself,
through either `VarBinding` or `MatchCast` nodes, the two relax
variables would share the same register in the VM.  As a result, any
upstream transform that deletes an object with `R.memory.kill_tensor`
or `R.memory.kill_storage` must be aware of this VM behavior, and to
only output one such instruction for each set of aliased registers.

This commit updates the VM to produce one register for each aliased
relax variable.  The trivial bindings can be removed applying
`relax.transform.CanonicalizeBindings`, instead of being implicitly
de-duplicated at the codegen level.

This PR is a follow-up to #15854,
with a better long-term solution, but which may have knock-on effects
that must also be resolved.  In addition, this adds a usage example
for the bug reported in #15852, to
avoid re-occurrence of similar issues.

* Add unit test to validate that the alias is preserved

* Use the new GetBoundValue utility function

* Update VMTIRCodeGen to also avoid de-duplication of bindings

* Move the callback definition to tvm.relax.testing.vm namespace
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants