Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

CUDA: Check failed: e == cudaSuccess: misaligned address with 3-layer BERT pretraining  #19155

@szhengac

Description

@szhengac

When I pretrained a 3-layer BERT model using GluonNLP 0.10 on one p3.24dn instance with 32GB GPU memory, I received CUDA: Check failed: e == cudaSuccess: misaligned address. With batch size 128 in total, it uses 11GB GPU memory and no error occurs. But when I slightly increased the total batch size to 176 or double it to 256, I received the error. I have cherry-picked #17767.

@sxjscience you may want to try the setting in numpy version.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions