Below is a rough code to explain what I did.
import torch.multiprocessing as mp
def inference(config):
data_loader = get_loader(config)
while True:
for step in range(128):
dec_outs, _ = turbo_decoder(current_pred,
memory_bank,
step,
memory_lengths=memory_lengths)
ctx = mp.get_context("spawn")
p = ctx.Process(target=inference, args=(config))
p.start()
p.join()
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
I got the above error when using turbo_decoder to generate data for training. The error could appear at any iteration. Sometimes after calling inference hundreds of times, sometimes after calling it thousands of times.
It looks similar to #174, but I have not found the real solution for two days.
Hopefully you could shed some light on this.