Skip to content
This repository was archived by the owner on Jun 21, 2024. It is now read-only.
This repository was archived by the owner on Jun 21, 2024. It is now read-only.

RuntimeError: No available kernel. Aborting execution. #9

@zarandioon

Description

@zarandioon

When I run the inference logic using the following script, I get RuntimeError: No available kernel. Aborting execution. error:

A100 GPU detected, using flash attention if input tensor is on cuda
  0%|                                                                                                                                                                                                                                                                                                                                      | 0/251 [00:00<?, ?it/s]/home/azureuser/PaLM/.venv/lib/python3.8/site-packages/palm_rlhf_pytorch/attention.py:100: UserWarning: Memory efficient kernel not used because: (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:659.)
  out = F.scaled_dot_product_attention(
/home/azureuser/PaLM/.venv/lib/python3.8/site-packages/palm_rlhf_pytorch/attention.py:100: UserWarning: Memory Efficient attention has been runtime disabled. (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:450.)
  out = F.scaled_dot_product_attention(
/home/azureuser/PaLM/.venv/lib/python3.8/site-packages/palm_rlhf_pytorch/attention.py:100: UserWarning: Flash attention kernel not used because: (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:661.)
  out = F.scaled_dot_product_attention(
/home/azureuser/PaLM/.venv/lib/python3.8/site-packages/palm_rlhf_pytorch/attention.py:100: UserWarning: Expected query, key and value to all be of dtype: {Half, BFloat16}. Got Query dtype: float, Key dtype: float, and Value dtype: float instead. (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:100.)
  out = F.scaled_dot_product_attention(
  0%|                                                                                                                                                                                                                                                                                                                                      | 0/251 [00:00<?, ?it/s]
Traceback (most recent call last):

... <truncated>

  File "/home/azureuser/PaLM/.venv/lib/python3.8/site-packages/palm_rlhf_pytorch/attention.py", line 100, in flash_attn
    out = F.scaled_dot_product_attention(
RuntimeError: No available kernel.  Aborting execution.

I tried installing the Pytorch nightly version and that did not help:

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121

NVIDIA driver version:

/usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0

PyTorch version:

pip3 show  torch
Name: torch
Version: 2.1.0.dev20230618+cu121
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /home/azureuser/PaLM/.venv/lib/python3.8/site-packages
Requires: filelock, pytorch-triton, sympy, networkx, jinja2, fsspec, typing-extensions
Required-by: torchvision, torchaudio, PaLM-rlhf-pytorch, lion-pytorch, accelerate

Any idea what could cause this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions