🐛 Describe the bug
install colossallai from pip install colossalai==0.2.0+torch1.12cu11.3 -f https://release.colossalai.org successed, and import successed.
But colossalai check -i failed, and colossalai run xx also failed.
logs:
(colos) [root@xxx hybrid_parallel]# colossalai check -i
Using /root/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Traceback (most recent call last):
File "/root/anaconda3/envs/colos/lib/python3.9/site-packages/colossalai/kernel/__init__.py", line 4, in <module>
from colossalai._C import fused_optim
ImportError: /lib64/libm.so.6: version `GLIBC_2.29' not found (required by /root/anaconda3/envs/colos/lib/python3.9/site-packages/colossalai/_C/fused_optim.cpython-39-x86_64-linux-gnu.so)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/anaconda3/envs/colos/bin/colossalai", line 5, in <module>
from colossalai.cli import cli
File "/root/anaconda3/envs/colos/lib/python3.9/site-packages/colossalai/__init__.py", line 1, in <module>
from .initialize import (
File "/root/anaconda3/envs/colos/lib/python3.9/site-packages/colossalai/initialize.py", line 23, in <module>
from colossalai.engine.schedule import NonPipelineSchedule, PipelineSchedule, InterleavedPipelineSchedule, get_tensor_shape
fused_optim = FusedOptimBuilder().load()
File "/root/anaconda3/envs/colos/lib/python3.9/site-packages/colossalai/kernel/op_builder/builder.py", line 77, in load
op_module = load(name=self.name,
File "/root/anaconda3/envs/colos/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1202, in load
File "/root/anaconda3/envs/colos/lib/python3.9/site-packages/torch/utils/_cpp_extension_versioner.py", line 15, in hash_source_files
with open(filename) as file:
FileNotFoundError: [Errno 2] No such file or directory: '/root/anaconda3/envs/colos/lib/python3.9/site-packages/colossalai/kernel/cuda_native/csrc/colossal_C_frontend.cpp'
(colos) [root@xxx hybrid_parallel]# strings /lib64/libm.so.6 | grep GLIBC
GLIBC_2.2.5
GLIBC_2.4
GLIBC_2.15
GLIBC_PRIVATE
GLIBC_2.15
So, we need to install glibc2.19, ref to https://blog.csdn.net/m0_37201243/article/details/123641552
After installed glibc2.19, colossalai check -i successed.
#### Installation Report ####
Colossal-AI version: 0.2.0
----------------------------
PyTorch Version: 1.12.0
PyTorch Version required by Colossal-AI: 1.12
PyTorch version match: ✓
----------------------------
System CUDA Version: 11.8
CUDA Version required by PyTorch: 11.3
CUDA Version required by Colossal-AI: 11.3
CUDA Version Match: x
----------------------------
CUDA Extension: ✓
but run hybrid_parallel still failed,
(colos) [colos@xxx hybrid_parallel]# colossalai run --nproc_per_node 2 train.py --config config.py
Error: failed to run torchrun --nproc_per_node=2 --nnodes=1 --node_rank=0 --rdzv_backend=c10d --rdzv_endpoint=127.0.0.1:29500 --rdzv_id=colossalai-default-job train.py --config config.py on 127.0.0.1
Environment
NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8
conda env has installed cudatoolkit = 11.3
🐛 Describe the bug
install colossallai from
pip install colossalai==0.2.0+torch1.12cu11.3 -f https://release.colossalai.orgsuccessed, and import successed.But
colossalai check -ifailed, andcolossalai run xxalso failed.logs:
So, we need to install glibc2.19, ref to
https://blog.csdn.net/m0_37201243/article/details/123641552After installed glibc2.19,
colossalai check -isuccessed.but run
hybrid_parallelstill failed,Environment
NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8
conda env has installed cudatoolkit = 11.3