[TT_ERROR] CUDA runtime error: an illegal memory access was encountered TurboTransformers/turbo_transformers/core/cuda_device_context.cpp:33

Below is a rough code to explain what I did. 

```python
import torch.multiprocessing as mp

def inference(config):
    data_loader = get_loader(config)
    while True:
        for step in range(128):
            dec_outs, _ = turbo_decoder(current_pred, 
                                        memory_bank, 
                                        step, 
                                        memory_lengths=memory_lengths)
            
ctx = mp.get_context("spawn")
p = ctx.Process(target=inference, args=(config))
p.start()
p.join()
```

```python
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
```

I got the above error when using turbo_decoder to generate data for training. The error could appear at any iteration. Sometimes after calling inference hundreds of times, sometimes after calling it thousands of times. 
It looks similar to #174, but I have not found the real solution for two days.

Hopefully you could shed some light on this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TT_ERROR] CUDA runtime error: an illegal memory access was encountered TurboTransformers/turbo_transformers/core/cuda_device_context.cpp:33 #191

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[TT_ERROR] CUDA runtime error: an illegal memory access was encountered TurboTransformers/turbo_transformers/core/cuda_device_context.cpp:33 #191

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions