After installtion libraries in google colab and using my own images, I encountered this message.
/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py:188: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects --local_rank argument to be set, please
change it to read from os.environ['LOCAL_RANK'] instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
FutureWarning,
WARNING:torch.distributed.run:
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1906, in _run_ninja_build
env=env)
File "/usr/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "train_styleswin.py", line 27, in
from models.discriminator import Discriminator
File "/content/StyleSwin/models/discriminator.py", line 7, in
from op import FusedLeakyReLU, upfirdn2d
File "/content/StyleSwin/op/init.py", line 4, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/content/StyleSwin/op/fused_act.py", line 19, in
os.path.join(module_path, "fused_bias_act_kernel.cu"),
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1296, in load
keep_intermediates=keep_intermediates)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1518, in _jit_compile
is_standalone=is_standalone)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1626, in _write_ninja_file_and_build_library
error_prefix=f"Error building extension '{name}'")
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'fused': [1/2] c++ -MMD -MF fused_bias_act.o.d -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /usr/local/lib/python3.7/dist-packages/torch/include -isystem /usr/local/lib/python3.7/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.7/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.7/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /content/StyleSwin/op/fused_bias_act.cpp -o fused_bias_act.o
FAILED: fused_bias_act.o
c++ -MMD -MF fused_bias_act.o.d -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /usr/local/lib/python3.7/dist-packages/torch/include -isystem /usr/local/lib/python3.7/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.7/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.7/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /content/StyleSwin/op/fused_bias_act.cpp -o fused_bias_act.o
In file included from /usr/local/lib/python3.7/dist-packages/torch/include/torch/csrc/Device.h:4,
from /usr/local/lib/python3.7/dist-packages/torch/include/torch/csrc/api/include/torch/python.h:8,
from /usr/local/lib/python3.7/dist-packages/torch/include/torch/extension.h:6,
from /content/StyleSwin/op/fused_bias_act.cpp:4:
/usr/local/lib/python3.7/dist-packages/torch/include/torch/csrc/python_headers.h:12:10: fatal error: Python.h: No such file or directory
12 | #include <Python.h>
| ^~~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "train_styleswin.py", line 27, in
from models.discriminator import Discriminator
File "/content/StyleSwin/models/discriminator.py", line 7, in
from op import FusedLeakyReLU, upfirdn2d
File "/content/StyleSwin/op/init.py", line 4, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/content/StyleSwin/op/fused_act.py", line 19, in
os.path.join(module_path, "fused_bias_act_kernel.cu"),
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1296, in load
keep_intermediates=keep_intermediates)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
ImportError: /root/.cache/torch_extensions/py37_cu117/fused/fused.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "train_styleswin.py", line 27, in
from models.discriminator import Discriminator
File "/content/StyleSwin/models/discriminator.py", line 7, in
from op import FusedLeakyReLU, upfirdn2d
File "/content/StyleSwin/op/init.py", line 4, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/content/StyleSwin/op/fused_act.py", line 19, in
os.path.join(module_path, "fused_bias_act_kernel.cu"),
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1296, in load
keep_intermediates=keep_intermediates)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
ImportError: /root/.cache/torch_extensions/py37_cu117/fused/fused.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "train_styleswin.py", line 27, in
from models.discriminator import Discriminator
File "/content/StyleSwin/models/discriminator.py", line 7, in
from op import FusedLeakyReLU, upfirdn2d
File "/content/StyleSwin/op/init.py", line 4, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/content/StyleSwin/op/fused_act.py", line 19, in
os.path.join(module_path, "fused_bias_act_kernel.cu"),
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1296, in load
keep_intermediates=keep_intermediates)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
ImportError: /root/.cache/torch_extensions/py37_cu117/fused/fused.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "train_styleswin.py", line 27, in
from models.discriminator import Discriminator
File "/content/StyleSwin/models/discriminator.py", line 7, in
from op import FusedLeakyReLU, upfirdn2d
File "/content/StyleSwin/op/init.py", line 4, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/content/StyleSwin/op/fused_act.py", line 19, in
os.path.join(module_path, "fused_bias_act_kernel.cu"),
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1296, in load
keep_intermediates=keep_intermediates)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
ImportError: /root/.cache/torch_extensions/py37_cu117/fused/fused.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "train_styleswin.py", line 27, in
from models.discriminator import Discriminator
File "/content/StyleSwin/models/discriminator.py", line 7, in
from op import FusedLeakyReLU, upfirdn2d
File "/content/StyleSwin/op/init.py", line 4, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/content/StyleSwin/op/fused_act.py", line 19, in
os.path.join(module_path, "fused_bias_act_kernel.cu"),
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1296, in load
keep_intermediates=keep_intermediates)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
ImportError: /root/.cache/torch_extensions/py37_cu117/fused/fused.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "train_styleswin.py", line 27, in
from models.discriminator import Discriminator
File "/content/StyleSwin/models/discriminator.py", line 7, in
from op import FusedLeakyReLU, upfirdn2d
File "/content/StyleSwin/op/init.py", line 4, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/content/StyleSwin/op/fused_act.py", line 19, in
os.path.join(module_path, "fused_bias_act_kernel.cu"),
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1296, in load
keep_intermediates=keep_intermediates)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
ImportError: /root/.cache/torch_extensions/py37_cu117/fused/fused.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "train_styleswin.py", line 27, in
from models.discriminator import Discriminator
File "/content/StyleSwin/models/discriminator.py", line 7, in
from op import FusedLeakyReLU, upfirdn2d
File "/content/StyleSwin/op/init.py", line 4, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/content/StyleSwin/op/fused_act.py", line 19, in
os.path.join(module_path, "fused_bias_act_kernel.cu"),
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1296, in load
keep_intermediates=keep_intermediates)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
ImportError: /root/.cache/torch_extensions/py37_cu117/fused/fused.so: cannot open shared object file: No such file or directory
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 7343) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 195, in
main()
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 191, in main
launch(args)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 176, in launch
run(args)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/run.py", line 756, in run
)(*cmd_args)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launcher/api.py", line 248, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
train_styleswin.py FAILED
Failures:
[1]:
time : 2024-03-23_15:44:50
host : fe0cdf2fa382
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 7344)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time : 2024-03-23_15:44:50
host : fe0cdf2fa382
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 7345)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
time : 2024-03-23_15:44:50
host : fe0cdf2fa382
rank : 3 (local_rank: 3)
exitcode : 1 (pid: 7346)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[4]:
time : 2024-03-23_15:44:50
host : fe0cdf2fa382
rank : 4 (local_rank: 4)
exitcode : 1 (pid: 7347)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[5]:
time : 2024-03-23_15:44:50
host : fe0cdf2fa382
rank : 5 (local_rank: 5)
exitcode : 1 (pid: 7348)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[6]:
time : 2024-03-23_15:44:50
host : fe0cdf2fa382
rank : 6 (local_rank: 6)
exitcode : 1 (pid: 7349)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[7]:
time : 2024-03-23_15:44:50
host : fe0cdf2fa382
rank : 7 (local_rank: 7)
exitcode : 1 (pid: 7350)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Root Cause (first observed failure):
[0]:
time : 2024-03-23_15:44:50
host : fe0cdf2fa382
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 7343)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html