Skip to content

run in colab #46

@Mojtaba1215

Description

@Mojtaba1215

After installtion libraries in google colab and using my own images, I encountered this message.

/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py:188: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects --local_rank argument to be set, please
change it to read from os.environ['LOCAL_RANK'] instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions

FutureWarning,
WARNING:torch.distributed.run:


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1906, in _run_ninja_build
env=env)
File "/usr/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "train_styleswin.py", line 27, in
from models.discriminator import Discriminator
File "/content/StyleSwin/models/discriminator.py", line 7, in
from op import FusedLeakyReLU, upfirdn2d
File "/content/StyleSwin/op/init.py", line 4, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/content/StyleSwin/op/fused_act.py", line 19, in
os.path.join(module_path, "fused_bias_act_kernel.cu"),
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1296, in load
keep_intermediates=keep_intermediates)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1518, in _jit_compile
is_standalone=is_standalone)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1626, in _write_ninja_file_and_build_library
error_prefix=f"Error building extension '{name}'")
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'fused': [1/2] c++ -MMD -MF fused_bias_act.o.d -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /usr/local/lib/python3.7/dist-packages/torch/include -isystem /usr/local/lib/python3.7/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.7/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.7/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /content/StyleSwin/op/fused_bias_act.cpp -o fused_bias_act.o
FAILED: fused_bias_act.o
c++ -MMD -MF fused_bias_act.o.d -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /usr/local/lib/python3.7/dist-packages/torch/include -isystem /usr/local/lib/python3.7/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.7/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.7/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /content/StyleSwin/op/fused_bias_act.cpp -o fused_bias_act.o
In file included from /usr/local/lib/python3.7/dist-packages/torch/include/torch/csrc/Device.h:4,
from /usr/local/lib/python3.7/dist-packages/torch/include/torch/csrc/api/include/torch/python.h:8,
from /usr/local/lib/python3.7/dist-packages/torch/include/torch/extension.h:6,
from /content/StyleSwin/op/fused_bias_act.cpp:4:
/usr/local/lib/python3.7/dist-packages/torch/include/torch/csrc/python_headers.h:12:10: fatal error: Python.h: No such file or directory
12 | #include <Python.h>
| ^~~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.

Traceback (most recent call last):
File "train_styleswin.py", line 27, in
from models.discriminator import Discriminator
File "/content/StyleSwin/models/discriminator.py", line 7, in
from op import FusedLeakyReLU, upfirdn2d
File "/content/StyleSwin/op/init.py", line 4, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/content/StyleSwin/op/fused_act.py", line 19, in
os.path.join(module_path, "fused_bias_act_kernel.cu"),
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1296, in load
keep_intermediates=keep_intermediates)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
ImportError: /root/.cache/torch_extensions/py37_cu117/fused/fused.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "train_styleswin.py", line 27, in
from models.discriminator import Discriminator
File "/content/StyleSwin/models/discriminator.py", line 7, in
from op import FusedLeakyReLU, upfirdn2d
File "/content/StyleSwin/op/init.py", line 4, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/content/StyleSwin/op/fused_act.py", line 19, in
os.path.join(module_path, "fused_bias_act_kernel.cu"),
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1296, in load
keep_intermediates=keep_intermediates)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
ImportError: /root/.cache/torch_extensions/py37_cu117/fused/fused.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "train_styleswin.py", line 27, in
from models.discriminator import Discriminator
File "/content/StyleSwin/models/discriminator.py", line 7, in
from op import FusedLeakyReLU, upfirdn2d
File "/content/StyleSwin/op/init.py", line 4, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/content/StyleSwin/op/fused_act.py", line 19, in
os.path.join(module_path, "fused_bias_act_kernel.cu"),
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1296, in load
keep_intermediates=keep_intermediates)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
ImportError: /root/.cache/torch_extensions/py37_cu117/fused/fused.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "train_styleswin.py", line 27, in
from models.discriminator import Discriminator
File "/content/StyleSwin/models/discriminator.py", line 7, in
from op import FusedLeakyReLU, upfirdn2d
File "/content/StyleSwin/op/init.py", line 4, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/content/StyleSwin/op/fused_act.py", line 19, in
os.path.join(module_path, "fused_bias_act_kernel.cu"),
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1296, in load
keep_intermediates=keep_intermediates)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
ImportError: /root/.cache/torch_extensions/py37_cu117/fused/fused.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "train_styleswin.py", line 27, in
from models.discriminator import Discriminator
File "/content/StyleSwin/models/discriminator.py", line 7, in
from op import FusedLeakyReLU, upfirdn2d
File "/content/StyleSwin/op/init.py", line 4, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/content/StyleSwin/op/fused_act.py", line 19, in
os.path.join(module_path, "fused_bias_act_kernel.cu"),
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1296, in load
keep_intermediates=keep_intermediates)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
ImportError: /root/.cache/torch_extensions/py37_cu117/fused/fused.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "train_styleswin.py", line 27, in
from models.discriminator import Discriminator
File "/content/StyleSwin/models/discriminator.py", line 7, in
from op import FusedLeakyReLU, upfirdn2d
File "/content/StyleSwin/op/init.py", line 4, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/content/StyleSwin/op/fused_act.py", line 19, in
os.path.join(module_path, "fused_bias_act_kernel.cu"),
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1296, in load
keep_intermediates=keep_intermediates)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
ImportError: /root/.cache/torch_extensions/py37_cu117/fused/fused.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "train_styleswin.py", line 27, in
from models.discriminator import Discriminator
File "/content/StyleSwin/models/discriminator.py", line 7, in
from op import FusedLeakyReLU, upfirdn2d
File "/content/StyleSwin/op/init.py", line 4, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/content/StyleSwin/op/fused_act.py", line 19, in
os.path.join(module_path, "fused_bias_act_kernel.cu"),
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1296, in load
keep_intermediates=keep_intermediates)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
ImportError: /root/.cache/torch_extensions/py37_cu117/fused/fused.so: cannot open shared object file: No such file or directory
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 7343) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 195, in
main()
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 191, in main
launch(args)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 176, in launch
run(args)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/run.py", line 756, in run
)(*cmd_args)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launcher/api.py", line 248, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

train_styleswin.py FAILED

Failures:
[1]:
time : 2024-03-23_15:44:50
host : fe0cdf2fa382
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 7344)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time : 2024-03-23_15:44:50
host : fe0cdf2fa382
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 7345)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
time : 2024-03-23_15:44:50
host : fe0cdf2fa382
rank : 3 (local_rank: 3)
exitcode : 1 (pid: 7346)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[4]:
time : 2024-03-23_15:44:50
host : fe0cdf2fa382
rank : 4 (local_rank: 4)
exitcode : 1 (pid: 7347)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[5]:
time : 2024-03-23_15:44:50
host : fe0cdf2fa382
rank : 5 (local_rank: 5)
exitcode : 1 (pid: 7348)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[6]:
time : 2024-03-23_15:44:50
host : fe0cdf2fa382
rank : 6 (local_rank: 6)
exitcode : 1 (pid: 7349)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[7]:
time : 2024-03-23_15:44:50
host : fe0cdf2fa382
rank : 7 (local_rank: 7)
exitcode : 1 (pid: 7350)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):
[0]:
time : 2024-03-23_15:44:50
host : fe0cdf2fa382
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 7343)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions