Skip to content

[TT_ERROR] CUDA runtime error: an illegal memory access was encountered TurboTransformers/turbo_transformers/core/cuda_device_context.cpp:33 #191

@auspicious3000

Description

@auspicious3000

Below is a rough code to explain what I did.

import torch.multiprocessing as mp

def inference(config):
    data_loader = get_loader(config)
    while True:
        for step in range(128):
            dec_outs, _ = turbo_decoder(current_pred, 
                                        memory_bank, 
                                        step, 
                                        memory_lengths=memory_lengths)
            
ctx = mp.get_context("spawn")
p = ctx.Process(target=inference, args=(config))
p.start()
p.join()
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

I got the above error when using turbo_decoder to generate data for training. The error could appear at any iteration. Sometimes after calling inference hundreds of times, sometimes after calling it thousands of times.
It looks similar to #174, but I have not found the real solution for two days.

Hopefully you could shed some light on this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions