[LLVM][Codegen] Avoid segfault when arith::GetVScaleValues returns empty vector#18586
Conversation
Summary of ChangesHello @mshr-h, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request implements a crucial fix to prevent a segmentation fault within the AArch64 LLVM code generation process. The issue arose when the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
arith::GetVScaleValues returns empty vectorarith::GetVScaleValues returns empty vector
There was a problem hiding this comment.
Code Review
This pull request fixes a segmentation fault in the AArch64 LLVM code generator that occurs when arith::GetVScaleValues returns an empty vector. The original code would crash when calling std::max_element on an empty vector. The fix adds a check to ensure the vector is not empty before processing it. This change is correct and effectively prevents the crash, improving the robustness of the code generator.
|
cc @cbalint13 |
| unsigned int max_val = *std::max_element(kVScaleValues.begin(), kVScaleValues.end()); | ||
| func->addFnAttr( | ||
| llvm::Attribute::getWithVScaleRangeArgs(*llvm_target_->GetContext(), 1, max_val)); | ||
| if (!kVScaleValues.empty()) { |
There was a problem hiding this comment.
It would be fine to protect this way, but we will miss the origin of issue.
My concern is that if we have SVE/SME isa presence (deduced from valid target) why the list is empty !?
For a proper fix, I would be curious what is the string value of target and what value have vector_width .
Here is the generator part responsible with population, missing the vector_width (zero) will not populate:
tvm/src/arith/scalable_expression.cc
Lines 106 to 123 in 6248b5d
How vector_width is populated:
-
I am afraid that TVM fails to pickup proper
vector_widthduring initial parsing of arch properties based on target provided, the initial default value population is happening here:tvm/src/target/llvm/llvm_instance.cc
Line 913 in 6248b5d
-
VecWidth default may be later overriden based on vector ISA presence and properties, the riscv case is more complex:
tvm/src/target/llvm/llvm_instance.cc
Line 305 in 6248b5d
If could help me with these two info I can come up with a proper fix.
I could also go the long way to test it on arm machine but then I need a clear reproducer case.
Thanks @mshr-h !
LATER EDIT:
I updated (edit) my comments here in order to get a clean view.
There was a problem hiding this comment.
Thank you for your comment @cbalint13 !
I'm targetting Apple M4 Pro. Seems likevector_width is 128.
Here's repro.
Code:
import torch
import torchvision
from torch.export import export
import tvm
from tvm.relax.frontend.torch import from_exported_program
tvm.support.describe()
torch_model = torchvision.models.resnet18(weights=None).eval()
example_args = (torch.randn(1, 3, 224, 224),)
exported_program = export(
torch_model,
args=example_args,
)
# Relax
target = tvm.target.Target(
"llvm -keys=arm_cpu,cpu -mcpu=apple-m4 -mtriple=arm64-apple-darwin25.2.0"
) # Apple M4 Pro target
print(f"vector_width: {tvm.get_global_func('target.llvm_get_vector_width')(target)}")
mod = from_exported_program(exported_program)
exe = tvm.compile(mod, target=target)Output:
% uv run relax/repro_segfault.py
Python Environment
TVM version = 0.23.dev0
Python version = 3.13.11 (main, Dec 9 2025, 20:26:22) [Clang 21.1.4 ] (64 bit)
os.uname() = Darwin 25.2.0 Darwin Kernel Version 25.2.0: Tue Nov 18 21:09:56 PST 2025; root:xnu-12377.61.12~1/RELEASE_ARM64_T6041 arm64
CMake Options:
{
"BACKTRACE_ON_SEGFAULT": "OFF",
"BUILD_DUMMY_LIBTVM": "OFF",
"BUILD_STATIC_RUNTIME": "OFF",
"COMPILER_RT_PATH": "3rdparty/compiler-rt",
"CUDA_VERSION": "NOT-FOUND",
"DMLC_PATH": "3rdparty/dmlc-core/include",
"GIT_COMMIT_HASH": "6248b5db43505fbcfb13cc289d11877d5d2649e8",
"GIT_COMMIT_TIME": "2025-12-13 02:29:23 -0500",
"HIDE_PRIVATE_SYMBOLS": "OFF",
"INDEX_DEFAULT_I64": "ON",
"INSTALL_DEV": "OFF",
"LLVM_VERSION": "21.1.7",
"MLIR_VERSION": "NOT-FOUND",
"PICOJSON_PATH": "3rdparty/picojson",
"RANG_PATH": "3rdparty/rang/include",
"ROCM_PATH": "/opt/rocm",
"SUMMARIZE": "OFF",
"TVM_BUILD_PYTHON_MODULE": "OFF",
"TVM_CLML_VERSION": "",
"TVM_CXX_COMPILER_PATH": "/usr/bin/c++",
"TVM_DEBUG_WITH_ABI_CHANGE": "OFF",
"TVM_LOG_BEFORE_THROW": "OFF",
"USE_ALTERNATIVE_LINKER": "AUTO",
"USE_AMX": "OFF",
"USE_ARM_COMPUTE_LIB": "OFF",
"USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR": "OFF",
"USE_BLAS": "none",
"USE_BNNS": "OFF",
"USE_BYODT_POSIT": "OFF",
"USE_CCACHE": "AUTO",
"USE_CLML": "OFF",
"USE_CLML_GRAPH_EXECUTOR": "OFF",
"USE_COREML": "OFF",
"USE_CPP_RPC": "ON",
"USE_CPP_RTVM": "",
"USE_CUBLAS": "OFF",
"USE_CUDA": "OFF",
"USE_CUDNN": "OFF",
"USE_CURAND": "OFF",
"USE_CUSTOM_LOGGING": "OFF",
"USE_CUTLASS": "OFF",
"USE_DNNL": "OFF",
"USE_FALLBACK_STL_MAP": "OFF",
"USE_GTEST": "AUTO",
"USE_HEXAGON": "OFF",
"USE_HEXAGON_EXTERNAL_LIBS": "OFF",
"USE_HEXAGON_GTEST": "/path/to/hexagon/gtest",
"USE_HEXAGON_RPC": "OFF",
"USE_HEXAGON_SDK": "/path/to/sdk",
"USE_HIPBLAS": "OFF",
"USE_IOS_RPC": "OFF",
"USE_KHRONOS_SPIRV": "OFF",
"USE_LIBBACKTRACE": "AUTO",
"USE_LIBTORCH": "OFF",
"USE_LLVM": "/opt/homebrew/opt/llvm/bin/llvm-config",
"USE_METAL": "OFF",
"USE_MIOPEN": "OFF",
"USE_MKL": "OFF",
"USE_MLIR": "ON",
"USE_MRVL": "OFF",
"USE_MSC": "OFF",
"USE_MSCCL": "OFF",
"USE_MSVC_MT": "OFF",
"USE_NCCL": "OFF",
"USE_NNAPI_CODEGEN": "OFF",
"USE_NNAPI_RUNTIME": "OFF",
"USE_NNPACK": "OFF",
"USE_NVSHMEM": "OFF",
"USE_NVTX": "OFF",
"USE_OPENCL": "OFF",
"USE_OPENCL_ENABLE_HOST_PTR": "OFF",
"USE_OPENCL_EXTN_QCOM": "NOT-FOUND",
"USE_OPENCL_GTEST": "/path/to/opencl/gtest",
"USE_OPENMP": "OFF",
"USE_PAPI": "OFF",
"USE_RANDOM": "ON",
"USE_RCCL": "OFF",
"USE_ROCBLAS": "OFF",
"USE_ROCM": "OFF",
"USE_RPC": "ON",
"USE_RTTI": "ON",
"USE_RUST_EXT": "OFF",
"USE_SORT": "ON",
"USE_SPIRV_KHR_INTEGER_DOT_PRODUCT": "OFF",
"USE_TENSORFLOW_PATH": "none",
"USE_TENSORRT_CODEGEN": "OFF",
"USE_TENSORRT_RUNTIME": "OFF",
"USE_TFLITE": "OFF",
"USE_THREADS": "ON",
"USE_THRUST": "OFF",
"USE_UMA": "OFF",
"USE_VULKAN": "OFF"
}
vector_width: 128
!!!!!!! Segfault encountered !!!!!!!
File "build/src/ffi/backtrace.cc", line 154, in TVMFFISegFaultHandler
File "/Users/mshr/data/project/tvm-example/tvm/src/target/llvm/codegen_aarch64.cc", line 61, in tvm::codegen::CodeGenAArch64::SetTargetAttributes(llvm::Function*)
File "/Users/mshr/data/project/tvm-example/tvm/src/target/llvm/codegen_llvm.cc", line 287, in tvm::codegen::CodeGenLLVM::DeclareFunctionInternal(tvm::GlobalVar const&, tvm::tir::PrimFunc const&)
File "/Users/mshr/data/project/tvm-example/tvm/src/target/llvm/codegen_llvm.h", line 656, in void tvm::codegen::CodeGenLLVM::AddFunctionsOrdered<tvm::ffi::Map<tvm::GlobalVar, tvm::BaseFunc, void>::iterator, void tvm::codegen::CodeGenLLVM::AddFunctionsOrdered<tvm::ffi::Map<tvm::GlobalVar, tvm::BaseFunc, void>::iterator>(tvm::ffi::Map<tvm::GlobalVar, tvm::BaseFunc, void>::iterator, tvm::ffi::Map<tvm::GlobalVar, tvm::BaseFunc, void>::iterator)::'lambda'(tvm::ffi::Map<tvm::GlobalVar, tvm::BaseFunc, void>::iterator)>(tvm::ffi::Map<tvm::GlobalVar, tvm::BaseFunc, void>::iterator, tvm::ffi::Map<tvm::GlobalVar, tvm::BaseFunc, void>::iterator, void tvm::codegen::CodeGenLLVM::AddFunctionsOrdered<tvm::ffi::Map<tvm::GlobalVar, tvm::BaseFunc, void>::iterator>(tvm::ffi::Map<tvm::GlobalVar, tvm::BaseFunc, void>::iterator, tvm::ffi::Map<tvm::GlobalVar, tvm::BaseFunc, void>::iterator)::'lambda'(tvm::ffi::Map<tvm::GlobalVar, tvm::BaseFunc, void>::iterator))
File "/Users/mshr/data/project/tvm-example/tvm/src/target/llvm/codegen_llvm.h", line 181, in void tvm::codegen::CodeGenLLVM::AddFunctionsOrdered<tvm::ffi::Map<tvm::GlobalVar, tvm::BaseFunc, void>::iterator>(tvm::ffi::Map<tvm::GlobalVar, tvm::BaseFunc, void>::iterator, tvm::ffi::Map<tvm::GlobalVar, tvm::BaseFunc, void>::iterator)
File "/Users/mshr/data/project/tvm-example/tvm/src/target/llvm/llvm_module.cc", line 356, in tvm::codegen::LLVMModuleNode::Init(tvm::IRModule const&, tvm::Target const&)
File "/Users/mshr/data/project/tvm-example/tvm/src/target/llvm/llvm_module.cc", line 664, in tvm::codegen::LLVMReflectionRegister()::$_0::operator()(tvm::IRModule, tvm::Target) const
File "<unknown>", line 0, in _PyEval_EvalFrameDefault
File "<unknown>", line 0, in PyEval_EvalCode
File "<unknown>", line 0, in run_eval_code_obj
File "<unknown>", line 0, in run_mod.llvm.17421610541250727766
File "<unknown>", line 0, in pyrun_file
File "<unknown>", line 0, in _PyRun_SimpleFileObject
File "<unknown>", line 0, in _PyRun_AnyFileObject
File "<unknown>", line 0, in pymain_run_file_obj
File "<unknown>", line 0, in pymain_run_file
File "<unknown>", line 0, in Py_RunMain
File "<unknown>", line 0, in pymain_main
File "<unknown>", line 0, in Py_BytesMain
/Users/mshr/.local/share/uv/python/cpython-3.13.11-macos-aarch64-none/lib/python3.13/multiprocessing/resource_tracker.py:400: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown: {'/mp-dtancvc1'}
warnings.warn(
There was a problem hiding this comment.
Thank you for your comment @cbalint13 !
I'm targetting Apple M4 Pro. Seems likevector_widthis 128.
Here's repro.
Thanks a lot for the help !
- I look at this, definitely would like to make it work properly !
- I have a OPI6 (SVE+SME support) , would like to see the tensorized kernels (expecting TFLOPS level).
- And it is time to fix tensorization for the transposed flavours of GEMM too.
I can do only in weekend, this week I am very busy.
In meanwhile, if you wish, can merge this as-is, I can fix in subsecvent PR (will let you know Cc ).
Once again thanks for looking into this !
As per title.